rights / license: research collection in copyright - non ... · user perception and acceptance...
TRANSCRIPT
Research Collection
Doctoral Thesis
Quality aspects of multimodal communicationuser perception and acceptance thresholds
Author(s): Zuberbühler, Hans-Jörg
Publication Date: 2003
Permanent Link: https://doi.org/10.3929/ethz-a-004583162
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.
ETH Library
DISS. ETH NO. 15124
QUALITY ASPECTS OF MULTIMODAL COMMUNICATION:USER PERCEPTION AND ACCEPTANCE THRESHOLDS
A dissertation submitted to the
SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH
for the degree of
Doctor of Natural Sciences
presented by
HANS-JORG ZUBERBOHLER
Dip!. Umwelt-Natw. ETH
born 11.02.1968
citizen of Urnasch (AR)
accepted on the recommendation of
Prof. Dr. Dr. Helmut Krueger, examinerProf. Or. Albert Kundig, co-examiner
Or. Sissel Guttormsen Schar, co-examiner
2003
Acknowledgement
This thesis would not exist without support and cooperation of a number of people,
whom I would like to thank:
First and foremost, Prof. Dr. Dr. Helmut Krueger, my promoter, for his keen obser
vation and his valuable advice. He provided an excellent research environment for the
achievement of this thesis.
Furthermore, Prof. Dr. Albert Kiindig for the hours we spent discussing, and for
funding the QED-project, in whose frame I was writing my thesis.
A great thank also to Dr. Sissel Guttormsen Schar who introduced me to the world of
scientific research. And to my other colleagues in the research group man-machine inter
action, who have contributed in one way or another to make my time at the ETH one
that I will always look back to with great pleasure: Marc Arial, Morten Fjeld, Christine
Hitzke, Pamela Ravasio, Sam Schluep, and Phillipe Zimmermann.
Thanks also goes to the QED-team members Alexander Braun and Patrik Estermann
for the work they have done to implement the videoconference setup and to run experi
ments.
A special thanks to Kent Riopelle who proofread my thesis and provided valuable
feedback to improve its comprehensibility.
Finally, I would like to thank my parents and friends who ~upported and encouraged
me. Most of all, I thank my partner Ruth for her continuous and loving support.
Ziirich, August 2003 Hans-Jorg Zuberbiihler
Table of Contents
Table of Contents I
Abstract V
Zusammenfassung IX
1 Transfer to Practice 1
1.1 Regarding Human-Computer Interaction (HCI) 4
1.2 Regarding Human-Human Interaction (HHI) 6
2 Introduction 9
2.1
2.2
2.3
Background and Aims 9
Scope of Investigation 12
2.2.1 Delay as Quality of Service (QoS) Parameter 15
2.2.2 Published Results for Perception and Acceptance of Delay 17
2.2.3 A Psychophysical Approach 19
Structure of the Thesis 20
3 Theory 23
3.1 A Taxonomy of Communication 23
3.1.1 Social context 26
3.1.2 Orientation 27
3.1.3 Coding 29
11 TABLE OF CONTENTS
3.1.4 Modality 31
3.1.5 Timing 34
3.1.6 Exemplification of the interpersonal communication model 37
3.2 Processing Time of Auditory and Visual Stimuli 40
3.2.1 Indirect: Reaction Time Differences 40
3.2.2 Direct: Event-Related Potentials (ERPs) 41
3.3 Mental Representation of Time 43
3.3.1 Low Frequency Processing 44
3.3.2 High Frequency Processing 46
3.4 Neural and Cognitive Models of Time Perception 48
3.4.1 Labelled Lines 49
3.4.2 Population Clocks (Neural Networks) 50
3.4.3 Pacemaker-Switch-Accumulator Models 52
3.5 Psychophysical Theory for Measuring Thresholds 54
3.5.1 Testing paradigms 54
3.5.2 Specification of the Psychometric function '1'= f(ifJ) 55
3.5.3 Adaptive Psychophysical Procedures 59
4 Experiments 65
4.1 In Human-Computer Interaction (HCI) Mode 65
4.1.1 Experimental Setup 66
4.1.2 Procedure 66
4.1.3 HCI-Results 70
4.2 In Human-Human Interaction (HHI) Mode 74
4.2.1 Experimental Setup 75
4.2.2 Procedure 76
4.2.3 HHI-Results 79
5 Discussion and Conclusions 85
5.1 Regarding Relative Delays 85
5.1.1 In Human-Computer Interaction (HCI) 85
TABLE OF CONTENTS III
5.2
5.3
Regarding Absolute Delays 89
5.2.1 In Human-Computer Interaction (HCI) 89
5.2.2 In Human-Human Interaction (HHI) 91
Further Research 97
5.3.1
5.3.2
Relative Delay 98
Absolute Delay 98
Annex 101
Developed Software: The best-PEST Calculator 101
Description 102
Monte-Carlo-Simulations 107
References 113
Glossary 121
Index 131
Seite Leer /Blank leaf
Abstract
Recent trends in telecommunication networks indicate a shift away from the use of
circuit-switched networks towards the use of packet-switched networks. This new net
working environment will present end users with new characteristics like variations in
transmission delays, and bit rates as well as potential loss of data packets. These charac
teristics represent a challenge in the design and use of packet-switched networks, since
they may be lead to user impairments, depending on the kind of source coding and com
pression used in the end-systems.
It is generally agreed that very little is known about user expectations or perceptive
mechanisms and user behaviour in this new situation. As a consequence, it is presently
difficult to base network engineering on proper traffic forecasts and real user require
ments. This lack of knowledge is the driving force behind our work, aiming to examine
user perception and acceptance of the Quality of Service (QoS) parameters absolute and
relative del(~ys (also referred to as roundtrip delay and !)nchronisation errory.
In this thesis we investigated the perception and the acceptance thresholds for particu
lar delay parameters using psychophysical methoc 5. I.e. threshold are obtained by means
of empirical determinations applying either 2-alt, nativeforced-choice oryes-no paradigms, and
using the adaptive psychophysical procedure ( Jled best-PEST. The experiments are con
ducted in the interaction modes Human-Co IJjJuter-Interaction (HCI), and Human-Human
Interaction (HHI), which evoke different del ,y perceptions. HeI delay thresholds are ob
tained using an experimental set up that irc1udes stimulus presentation, best-PEST algo
rithm, and data acquisition. It is implemented using the object-oriented scripting lan
guage Lingo. The experiments conducted in the HHI mode comprise threshold determi
nations in which the experimental subjects interact with each other over a videoconfer
ence that uses an ATM-network infrastructure. The experimental set-up consists of two
VI ABSTRACT
or three videoconference stations connected via fibre passing through a system called
ARES, which emulates the behaviour of AIM channels in real-time with the possibility
to emulate performance degradations, such as delay or errors.
In the HCI mode the following thresholds are determined:
• Relative delay between auditory stimuli preceding visual stimuli (AV).
• Relative delay between visual stimuli preceding auditory stimuli 01A).
• Absolute delay between voice input and visual computer-generated response
0loiVis).
• Absolute delay between mouse input and visual computer-generated re
sponse (MouVis).
In the HHI mode the following thresholds are determined:
• Absolute delay in basic auditory interaction between two subjects (AudBas).
• Absolute delay in basic visual interaction between two subjects 01isBas).
• Absolute delay in realistic audio-visual interaction between three subjects
(AudVisReal) .
• Absolute delay in realistic auditory interaction between three subjects
(AudReal).
The thresholds for relative delays are 71 (±17) ms for the AV condition, and 105
(±25) ms for the VA condition. The thresholds for absolute delay in HCI are 115 (±23)
ms for the VoiVis condition, and 78 (±14) ms for the MoiVis condition. In HHI the
thresholds for absolute delays are 216 (±44) ms for the AudBas condition, and 237 (±92)
ms for the VisBas condition. Accomplishing a realistic task the perception threshold is
1220 ms, and the acceptance threshold is 2080 ms in the AudVisReal condition. In the
AudReal condition the perception threshold is 970 (±310) ms, and the acceptance
threshold is 1760 (±410) ms. Age and gender of the experimental subjects have no sig
nificant effect (p>0.05) on these results.
To obtain psychometric functions experimental data of each condition are fitted using
a logistic model. The benefit of such functions is that network planners, as well as con
tent and service providers are delivered with a means to estimate which user percentages
are expected to detect and to reject a specific delay. This 'political' question is influenced
ABSTRACT VII
by economical considerations, which price/performance ratio is intended to be offered
to the user.
Furthermore the relative delay thresholds are discussed in the light of neural process
ing times for different modalities. And the absolute delay threshold is discussed regarding
the task dependency represented by different degrees of interactivity.
Seite Leer /Blank leaf
Zusammenfassung
Telekommunikationsnetzwerke werden umgestellt von vermitdungsorientierten zu pa
cketvermittelten Netzwerken. Diese Umstellung hat zur Folge, dass die Benutzer mit
veriinderten Netzwerkeigenschaften konfrontiert werden, wie zum Beispiel einer
variablen Durchsatzrate und Obertragungsverzogerung, aber auch mit Verlusten von
Datenpaketen. Diese neuen Eigenschaften stellen eine Herausforderung beziiglich
Auslegung und Benutzung von packetvermittelten Netzwerken dar, da sie zu
Behinderungen des gewohnten Kommunikationsprozesses fiihren konnen.
Bis anhin ist in diesem Gebiet noch wenig gesichertes Wissen vorhanden, weder dar
iiber wie die Benutzer diese neue Situation wahrnehmen, noch damber wie sie sich insge
samt verhalten. Dies erschwert die Konzeptionierung und Dimensionierung von Tele
kommunikationsnetzen, da auf fundierte Annahmen iiber Benutzerbediirfnisse und ver
Hissliche Vorhersagen zur Netzwerkbelastung verzichtet werden muss. Die beschriebene
Wissensliicke ist die treibende Kraft hinter der vorliegenden Arbeit, in der die Wahrneh
mung und Akzeptanz der beiden Dienstqualitat-l 'arameter absolute und relative Verziigerung
untersucht werden.
Die Wahrnehmungs- und Akzeptanzschwellen der einzelnen Verzogerungsparameter
werden anhand empirischer Versuche mit psychophysischer Methodik bestimmt. Dabei
kommen entweder das 2-alternative forced-choice oder das yes-no Paradigma sowie das adapti
ve psychophysische Verfahren best-PEST zur Anwendung. Die Experimente sind aufge
teilt in die beiden Interaktionsmodi Mensch-Computer-Interaktion (HCI) und Mensch-Mensch
Interaktion (HHI), die beide unterschiedliche Verzogerungswahrnehmungen hervorrufen.
Zur Bestimmung der HCI-Verzogerungsschwellen wird ein Versuchsaufbau eingesetzt,
der Stimuluspriisentation, best-PEST Algorithmus und Datenerhebung vereint und der
mit der objektorientierten Skriptsprache Lingo programmiert ist. Versuche im HHI Mo-
x ZUSAMMENFASSUNG
dus andererseits werden mittels einer Videokonferenzanwendung durchgefiihrt, die uber
ein emuliertes ATM-Netzwerk Hiuft. Dieser Versuchsaufbau besteht aus zwei oder drei
Videokonferenzstationen, die uber Glasfaser mit dem sogenannten ARES-System ver
bunden sind. (Das ARES-System emuliert das Echtzeit-Verhalten von ATM-Kanalen
und bietet die Moglichkeit, gezielt Leistungsverschlechterungen bezuglich Verzogerung
und Fehlerverhalten zu simulieren).
Im HCI Modus werden folgende Schwellwerte bestimmt:
• Relative Verzogerung zwischen auditiven Stimuli, die den visuellen vorange
hen (AV).
• Relative Verzogerung zwischen visuellen Stimuli, die den auditiven vorange
hen (VA).
• Absolute Verzogerung zwischen Stimmeingabe und visueller, rechnerge
stiitzter Antwort (VoiVis).
• Absolute Verzogerung zwischen Mauseingabe und visueller, rechnergestiitz
ter Antwort (MouVis).
Im HHI Modus werden folgende Schwellwerte bestimmt:
• Absolute Verzogerung bei einfacher auditiver Interaktion zweier Versuchs
personen (AudBas).
• Absolute Verzogerung bei einfacher visueller Interaktion zweier Versuchs
personen (VisBas).
• Absolute Verzogerung bei realistischer audio-visueller Interaktion zwischen
drei Versuchspersonen (AudVisReal).
• Absolute Verzogerung bei realistischer auditiver Interaktion zwischen drei
Versuchspersonen (AudReal).
Die Schwellwerte fur relative Verzogerungen betragen 71 (±17) ms in der AV
Bedingung und 105 (±25) ms in der VA-Bedingung. Die Schwellwerte fur absolute Ver
zogerung in HCI betragen 115 (±23) ms in der VoiVis-Bedingung und 78 (±14) ms in
der MoiVis-Bedingung. In HHI betragen die Schwellwerte fur absolute Verzogerung 216
(±44) ms in der AudBas-Bedingung und 237 (±92) ms in der VisBas-Bedingung. Wenn
die Versuchspersonen realistische Gesprachssituationen nachzubilden haben, liegt ihre
Wahrnehmungsschwelle bei 1220 ms und ihre Akzeptanzschwelle bei 2080 ms (AudVis
Real-Bedingung). In der AudReal-Bedingung betragen diese Werte 970 (±330) ms fur
ZUSAMMENFASSUNGXI
Wahrnehmung und 1760 (±410) ms fur die Akzeptanz. Weder das Alter noch das Ge
schlecht der Versuchspersonen ubt einen signifikanten Einfluss (p>O.OS) auf die
Schwellwerte aus.
Um aus den experimentellen Daten psychometrische Funktionen zu erhalten, werden
fur alle Bedingungen logistische Kurven gefittet. Der Nutzen dieser Funktionen besteht
darin, dass Netzwerkplaner sowie Anbieter von Inhalten und Diensten abschatzen kon
nen, welcher Anteil Benutzer bestimmte Verzogerungswerte bemerken und/oder ableh
nen wird. Diese ,politische' Frage wird massgeblich durch okonomische Betrachtungen
beeinflusst, welches Preis-Leistungsverhaltnis den Kunden angeboten werden soli.
Des weiteren werden die re1ativen Verzogerungsschwellwerte im Licht der neuronalen
Verarbeitungsgeschwindigkeit fur verschiedene Modalitaten diskutiert. Und absolute
Verzogerungsschwellwerte werden beziiglich ihrer Abhangigkeit von den auszufuhren
den - durch verschiedene Interaktionsgrade gekennzeichnete - Aufgaben diskutiert.
1 Transfer to Practice
This chapter compiles the results rif the thesis that are direct!J transftrable to fields rifprac
tice. Atfirst a briif summary rif the background and than the motivation fOr the thesis is
presented. Subsequent!J qualitative results are discussed, and final!J quantitative!J listed in
diverse tables, each consisting rif user percentages fOr particular deltry types, andfor the two
interaction modes, Human-Computer-Interaction (HCI), and Human-Human-Interaction
(HHI).
In recent times, the underlying technology of public network infrastructures experi
enced a radical change from circuit-switched to packet-switched technology. The original
reason for this change is that with new packet-switched technologies, for instance Ipa,
the infrastructure can be operated with better capacity, since several data streams can be
multiplexed. This is in contrast to traditional circuit-switched technologies with ISDN
serving as the most service-wise advanced example, where certain circuits are reserved
for respective services. Another reason for the change is that packet-switching better
matches the characteristics of computer-generated data. This is crucial considering the
spread of computers acting as end-systems. Regarding the characteristics of these two
technologies it appears that - on the one hand - packet-switching results in higher net
work dJicienry, but - on the other hand - causes lower network predictability in terms of the
Quality of Service (QoS): Variations in transmission delays and bit rates, as well as poten
tialloss of data-packets are more likely to occur.
These new characteristics may lead to user impairments, depending on the application
used. For instance, real-time applications like voice-over-IP (VoIP) or videoconferences
are considered most critical in terms of delay. In this context, the question is, at which
threshold value a delay becomes perceivable, and at which threshold value does it be-
a All acronyms and abbreviations are explained in the glossary beginning on page 121.
2 CHAPTER 1. TRANSFER TO PRACTICE
come perturbing. In this thesis we decided to investigate these two delay thresholds by
means of a psychophysical approach. As a quantifiable result of the thesis, so-called psy
chometric functions are obtained, which describe the user detectability and acceptance of
different delay values. These functions are listed at the end of this chapter. Before, we
take a look at the results of the thesis that are rather of qualitative nature.
The experiments showed that perception and consequently acceptance of delays are
very much task-dependent. Therefore it is probably not helpful to recommend universal
threshold values; rather they should be suggested for different task categories. It seems
that the degree rf interactiviry acts as the most delay-sensitive property of any communica
tion scenario. For the time being, the choice of offered delay values should be kept as a
business strategy of the service provider. In order to base this strategy on a reliable fun
dament, it will be helpful to classify the abundance of relevant task categories, and to as
sess the proper delay thresholds for these categories separately. With such knowledge it
will be possible to adjust the delay values according to the measured degree of interactiv
ity. Having knowledge of the appendant psychometric function, the delay can be set ac
cording to a predefined (or negotiated) percentage of users perceiving or accepting this
particular delay.
Additionally, the experiments of this thesis showed that delay perception and accep
tance are not only influenced by the degree of interactivity but also to a great extent by
the number of communication channels the application offers: it seems that the visual
channel in an audio-visual application is acting as a distractor. I.e. the focus of attention
is divided into parts for the audio, and parts for the visual channel. Thus, the gain of
'media richness' in audio-visual communication has to be paid by a loss of focussed per
ception. Furthermore, the number of participants of the communication event turned
out to be another distractor: it seems that the focus of attention is divided into all com
munication members. With increasing number of participants, this results in a decrease
of attentional resources for the detection of delay.
In summary it appears that the three following factors are responsible for the users'
delay requirements. All of them result in high delay-tolerance, since they evoke poor de
lay perception:
• Low degree of interactivity.
• Increasing number of participants.
• Transition from mono- to multimodal communication.
1.1 Regarding Human-Computer Interaction (HCn 3
Suboptimal communication support, e.g. missing gaze awareness of the videoconfer
ence technology could be mentioned as a fourth factor that leads to higher delay
tolerance. Without gaze awareness the communication members are not sure when they
are addressed - unless they are explicidy verbally addressed. This again slows down the
degree of interactivity, and might be the reason, why the results of the conducted ex
periments suggest acceptable delay values for realistic audio-visual tasks that are well
above the elsewhere suggested values for audio-only tasks. Nevertheless, we have to bear
in mind two things.
• Users still have poor practice with multiperson, multimodal telecommunica
tion services. With the upcoming use of such services, users will most
probably improve their delay perception skills ~.e. they will avail themselves
of free attentional resources that are no longer needed to cope with the new
technology).
• The psychophysical methods used in the experiments disallow conclusions
about long term effects regarding what Wilson calls user costs (Wilson et al.,
2000). Although users do not perceive a particular delay as disturbing, it may
subconsciously increase mental strain. A technology which evokes such ef
fects contradicts the user-centred paradigm.
The remainder of this chapter presents the quantitative results of the conducted ex
periments. They provide insight about the users' perception performance for different
delays types in different modalities. Due to the above mentioned reasons, the results of
the realistic tasks (fable 4) should not be applied to situation where only two partners
communicate.
The results are divided into Human-Computer-Interaction (HC!), and Human
Human-Interaction (HH!). They describe two fundamental interaction modes resulting
in different delay perception. A further distinction concerns the types of delay: Relative de
Itry is perceived between particular modalities, e.g. between the auditory and the visual
channel. This delay is sometimes called intermedia ~nchronisatio J, ~nchronisation error, or lip
~nchronisation. The other type is called absolute de/try. It is per :eived only in dialogue set
tings between sending information and receiving answer from the dialogue partner. This
delay is sometimes called roundtrip deltry or return trip deltry.
The benefit of the following tables is that network planners, and content providers,
are delivered with a means to estimate which user percentages are expected to detect and
4 CHAPTER 1. TRANSFER TO PRACTICE
to reject a specific delay. This 'political' question is influenced by economical considera
tions, which price/performance ratio is intended to be offered to the user.
1.1 Regarding Human-Computer Interaction (Hel)
In HeI, we ran experiments determining the relative and the absolute delays. Results
concerning relative delays are available for situations where audio precedes the appendant
visual stimulus (condition AV), as well as for the opposite stimulus order (condition VA)
(see Table 1). The results are considered suitable for most stringent requirements, i.e. for
tasks facilitating the perception of asynchrony.
Further results concern the detection of absolute delays in situations where users ex
perience a delay between their vocal input and a computer-generated visual response
(condition VoiVis). Or between their mouse input and a computer-generated visual re
sponse (condition MouVis) (see Table 2). These results are considered suitable for appli
cations that e.g. enable voice recognition, or that are driven by mouse pointer or joystick
inputs (e.g. database queries, browsing the WWW, or image processing software). Addi
tionally, the absolute delay thresholds in HeI can also be used to analyse the later de
scribed HHI thresholds, since they represent a component inherent to all network
mediated HHI.
1.1 Regarding Human-Computer Interaction (HCn
Table 1 Relative delay values perceived by particular percentages of users.
Reading example: It can be expected that not more than 25 % of users will detect an
AV-delay of 53 ms, and a VA-delay of 74 ms, respectively.
5
Percentage ofUsel'$Detecting Asynchrony in Bel
"f%]5
10
25
33
50
67
75
90
95
Extent of Asynchrony whenAuditory Precedes Visual (AV)
; [ms]
12
29
53
61
77
92
101
125
141
Extent of Asynchrony whenVisual Precedes Auditory (VA)
; [ms]
34
50
74
67
98
113
122
146
162
Table 2 Absolute delay values perceived by particular percentages of users.
Reading example: It can be expected that up to 75 % of users will detect an absolute de
lay of 146 ms when interacting by voice. And up to 75 % of users will detect an absolute
delay of 96 ms when interacting by mouse clicks.
Percentages of Users DetectingAbsolute Delays in Hel
"C%]25
33
50
67
75
90
95
Absolute Delay inVocVis Interaction Mode
; [ms]
50
67
98
129
146
195
228
A .~ . • -nal.." 1ft
Aa • •• uelay :;ode; [ms]
33
45
65
85
96
128
149
6 CHAPTER 1. TRANSFER TO PRACTICE
1.2 Regarding Human-Human Interaction (HHI)
In the HHI mode we ran experiments determining absolute delay thresholds. Results
are available for delay perception in basic auditory and visual interaction (conditions
AudBas and VisBas) (see Table 3). Since these experiments evoked a maximal degree of
interactivity, the results are considered to represent the minimal delay users can perceive,
when interacting together. Further HHI experiments concern perception and acceptance
of absolute delays, when users execute realistic tasks (conditions AudVisReal and
AudReal) (see Table 4). Note that these results count only for the chosen task (free dis
cussion about a familiar topic) involving three participants. Since other tasks might evoke
different degrees of interactivity, they are assumed to allow for different delay values.
Table 3 Absolute delay values perceived by particular percentages of users.
Reading example: It can be expected that up to 75 % of users will detect an absolute de
lay of 228 ms in auditory HHI. And up to 75 % of users will detect an absolute delay of
239 ms in visual HHI.
Percentages of Users DetectingAbsolute Delays in HHI
",[%]
5
10
25
33
50
67
75
90
95
a ...
a L _t
109
131
164
175
196
217
228
261
283
aL _I. <_
.. ---" .""-",'1T , •••_"
109
133
169
181
204
227
239
275
299
1.2 Regarding Human-Human Interaction (HHI)
Table 4 Absolute delay values perceived by particular percentages of users.
Reading example: It can be expected that not more than 33 % of users will detect an ab
solute delay of 734 ms when interacting audio-visually, and 535 ms when interacting
solely in the auditory mode. And not more than 33 % of users will find that an absolute
delay of 1610 ms is disturbing when interacting audio-visually, or 1430 ms when inter
acting in the auditory mode. Note that these values count only for the chosen task.
7
Percentages of UsersDetecting or AcceptingAbsolute Delays in HHI
",rlOJ5
10
25
33
50
67
75
90
95
Perception of Absolute Delay
In realistic In realisticaudio-visual task audio-only task
(AudVisReal) (AudReal); [msJ ; [msJ
n.a. n.a.
n.a. n.a.
466 386
734 535
1220 800
1710 1070
1970 1220
2730 1640
3240 1920
Acceptance of Absolute Delay
In realistic In realisticaudio-visual task audio-only task
(AudVisReal) (AudReal); [msJ ; [msJ
n.a. 617
629 889
1350 1290
1610 1430
2080 1690
2550 1940
2810 2090
3530 2480
4030 2750
Seite Leer /Blank leaf
2 Introduction
In this chapter we expose the reasons that motivated us to investigate quality issues in mul
timodal real-time communication. To begin we briif!y describe the state-rif-the-art in tele
communication technology and outline user impacts 0/ such technology. Subsequent!J the ge
neric approach is narrowed down to fit the actual scope 0/ the investigation, pointing out the
p.rychopl!Jsical approach for measuring delay perception lry means 0/aglobal model 0/the us
ers' perception and acceptance 0/ environmental stimuli. Last!J the structure 0/ the thesis is
presented.
2.1 Background and Aims
Recent trends in telecommunication networks indicate a shift away from the use of
circuit-switched networks (with ISDN serving as the most technologically and service
wise advanced example) towards the use of packet-switched networks (e.g. lP, ATM or
MPLS) (Coffman et aI., 1998). Thus, most operators of public networks plan to migrate
their core network infrastructure to a universal, service-independent system operating in
a packet-switched mode. The original motive for using packet-switching stemmed from
the idea that the existing infrastructure could be used more efficiently by multiplexing
several data streams. In the meantime however, it has become clear that this focus is no
longer sufficient. Rather, packet-switching better matches the characteristics of com
puter-generated data. The traditional telecommunication networks (e.g. ISDN) have es
sentially been characterized by the following properties:
10 CHAPTER 2. INTRODUCTION
• Almost constant - and in the case of wire line transmission -low delay.
• Very low error rates for the ftxed network.
• Network services associated with constant bandwidth.
In contrast, the new networking environment will present end users with new charac
teristics like:
• Variations in transmission delay.
• Variations in bit rates.
• Potential loss of data packets.
Thus it appears that with packet-switched networks the users cannot count on a stable
Quality of Service (QOS)b anymore. These new characteristics represent a challenge in the
design and use of packet-switched networks, since they may be lead to user impairments,
depending on the kind of source coding and compression used in the end-systems.
It is generally agreed that very litde is known about user expectations or perceptive
mechanisms and user behaviour in this new situation (Bouch et al., 2000b). Furthermore
it is not yet known how objective system quality relates to users' subjective perceptions
of quality. The reason for this situation is that to date the majority of research on QoS is
systems oriented, focusing on trafftc analysis, scheduling, and routing. Relatively minor
attention has been paid to user-level QoS issues (Bouch et al., 2000a). Moreover, it is not
yet known if and how users make trade-off decisions between variant quality perform
ance and cost. As a consequence, it is presendy difftcult to base network engineering on
proper trafftc forecasts and real user requirements.
At the same time, the range of applications run by users is growing considerably
(Odlyzko, 2000), from traditional point-to-point phone calls to sophisticated computer
based applications, involving both users and servers. In addition, the last few years have
seen mobile communication become all-pervasive, with telephony and short message
services (SMS) dominating. With mobile users, yet another phenomenon is observed:
b The basic Quality of Service (QoS) parameters are: throughput, transit delay, jitter (delay vari
ance), and error rate. For the numerous deftnitions of the QoS-concept see Fluckiger (1995).
Note that the QoS concept is also applied for a broader scope including e.g. picture and sound
quality as well as security aspects.
2.1 Background and Aims 11
Many such users value the ability to communicate freely at least as high as some per
formance measures for the actual service. Two examples may illustrate this observation:
• Audio quality of mobile end devices is obviously very often tolerated at a
level considerably below POTS standards.
• SMS users accept an extremely unwieldy user interface.
This observation is in some ways synonymous to the well-known masking effects in
auditory perception (Zwicker et al., 1967), where certain stimuli are not noticed when
some other stimulus is present at a level above a certain threshold. Such effects may pos
sibly be generalized to a fundamental 'masking principle' where impairments are judged
in the light of the attained benefits, i.e. perturbing stimuli could be masked by more val
ued stimuli. However, the inverse effect is also true, describing a 'negative masking', re
ferred to here as an 'amplifying principle', where negative circumstances amplify a per
turbing stimulus. For example as might be the case in emergency situations where a lack
of effective communication quality may cause adverse effects. In contrast to the quality
factors based on technological sourcesc, masking and amplifying quality factors cannot be
controlled so much, since they are based on contextual and psychological causes.
To recapitulate, it appears that evolutions in telecommunication technology as well as
the growing number of applications deployed by users reveal a broad field of unanswered
questions concerning quality issues. This lack of knowledge is the driving force behind
our work, aiming to examine the end-user's perception and acceptance of QoS
parameters, thereby answering the question:
• How do network-induced impairments affect the interaction oftwo or more users ofa tele
communication .rystem?
The investigations to answer the above question are undertaken in the framework of a
project called QEDd (Kiindig et al., 2001), which aims at making a substantial contribu
tion towards quality-based network engineering, emphasising multimodal person-to-
c which in fact are addressed by the Quality of Service (QoS) concept.d QED is the acronym for Quality ofSmice Expectationsfor Real-Time Dialogue Communication, whichis accomplished in collaboration with Albert Kiindig and Alexander Braun from the ComputerEngineen'ng and Networks Lzboratory (IlK) of the Swiss Federal Institute of Technology Zurich (ETHZ).On his part, Alexander wrote a thesis (Braun, 2003) with emphasis on technological aspects.
12 CHAPTER 2. INTRODUCTION
person and person-to-computer communication, thus emphasising an user-centred per
spective.
2.2 Scope of Investigation
We will now briefly describe the scope of the investigations undertaken in this thesis.
For this purpose, the following hierarchical diagram (see Figure 1) classifies some out
standing attributes of communication. The chosen emphasis is drawn bold, whereas
situations not in our focus are drawn grey.
Communication
Technologically-Mediated
Real-Time (Synchronous)
Figure 1 Tree diagram showing the chosen emphasis on technologically mediated dialogue communication in real-time. Note that not all possible connections aredepicted.
The emphasis on technologically mediated dialogue communication in real-time has
been chosen for two essential reasons:
• Promising Future Applications
• Predictable User Expectations
2.2 Scope of Investigation
In the following the two reasons are briefly explained.
Promising Future Applications
13
Real-time dialogue communication between users will- despite upcoming new types
of applications - most probably remain an important and revenue-generating application
in both fixed and mobile telecommunication, and in both private and business communi
cation. Examples of such applications are pure videoconference applications, CSCW
tools, or services using UMTS technologies.
Predictable User Expectations
Face-to-face communication between people - which is the unmediated pendant to
technologically mediated communication - requires extremely sophisticated and well
trained pattern recognition skills: in contrast to computer-based pattern recognition, hu
mans are capable of interpreting very subtle variations in facial expression, voice pitch
and timing. As a consequence, we are all experts in the recognition of behavioural devi
ances from what we consider normale. For this reason it should be easy to model and
predict user expectancies for technologically mediated interpersonal communication in
real-time: Since the users compare such services with face-to-face communication, they
are assumed to expect from the application a behaviour which is equivalent to natural
face-to-face communication. This is in contrast to many other applications in the area of
man-machine interaction (e.g. browsing the WWW), for which user expectations are dif
ficult to predict. The reason could be that there is no natural equivalent for these kinds of
applications.
In summary, it appears that applications should support the fundamental information
exchange by the use of audio (hearing each other), video (seeing each other), and shared
tools (such as chat or whiteboard, and application sharing), at best without perceivable
differences to the face-to-face situation (which of course is hardly to attain). Ultimately,
users expect telecommunication services to include the proper conveyance of relevant
environmental aspects (e.g. background noise). Moreover - and probably most crucial-
e On the other hand, a lifetime is probably not enough to attain perfection in face-to-face communication, in the sense that the communicating partners can be sure that the meanings of theirstatements are understood in the intended way.
14 CHAPTER 2. INTRODUCTION
technologically mediated real-time communication is expected to allow for temporal pat
terns, which are similar to face-to-face communication.
As mentioned in the previous section, it is generally agreed, that very little is known
about user expectations in regard to QoS issues (Bouch et al., 2000b). On the other hand,
it was assumed in this section that users expect an application behaviour that allows simi
lar to face-to-face communication. In fact, these contradictory statements delineate the
objectives of this thesis: Under the assumption that users assimilate technologically medi
ated communication with face-to-face communication, we aim at measuring the bound
ary values of particular QoS-parameters that should not be exceeded in order to allow for
a 'feeling of naturalness'. This - in regard to the lack of quality - upper boundary is called
acceptance threshold.
Furthermore it is assumed that people perceive maximum communication quality re
garding QoS-parameters in face-to-face situations. Or inversely, they do not perceive a
lack of quality in face-to-face situations. As such, this would mean that in face-to-face
communication people are familiar with having maximal information throughputs, no de
lays, no jitter and no error rates. Of course this premise is somewhat out of touch with
reality, and strictly speaking - in the case of delay - incorrect: There is always a minimal
delay due to the propagation speed of sound and light. The following three reasons may
illustrate why this premise, nevertheless, makes sense:
• The transmission delay in face-to-face communication is constant and negli
gible small (approximately 3 ms per meter of communication distance).
• The comparison is drawn by means of an idealised face-to-face situation,
where no disturbing outer influences are present.
• The face-to-face situation is chosen as an idealised point of reference, in or
der to position the quality perception in technologically mediated communi
cation.
With this premise in mind, we aim at investigating a second threshold, providing an
answering to the question: Which degradations of particular QoS-parameters are 'just'
noticed by the users? This question is mainly important for economical reasons, since the
knowledge of the so-called perception threshold provides network planners and content pro
viders with a basis for decision. In fact, below perception threshold values, users will not
benefit from optimisation of network and end-system infrastructure referring to QoS
parameters.
2.2 Scope of Investigation 15
In summary, the scope of our investigations consists of perception and acceptance
thresholds (see Figure 2) in technologically mediated, real-time communication. Note
that in Figure 2 the face-to-face situation is assumed to be at the point where no perturb
ing stimulus is present.
Perturbing Stimulus
I'-------- -------
••••""...--/'
",*50% i"
",l'Perception
ill'## Acceptance
Threshold##
J l1li" Threshold",,'""_,fill..-•••••• -~
100%C\)uc.f3i"~u .u -« co• :sca0_z 0... ~o uc co0-1.- -0. 0
~C\)c..
0% o
Figure 2 Delineation of the scope of investigation for this thesis, showing theacceptance and perception thresholds for an arbitrary perturbing stimulus. The curvesdepict hypothetical user response behaviours.Straight line: Perception of lack of quality.Dashed line: Non-acceptance of lack of quality.
2.2.1 Delay as Quality of Service (QoS) Parameter
So far the question of the users' QoS-perception has been addressed in a rather ge
neric manner, incorporating the basic QoS-parameters. The sy,tematic investigation of allthese parameters, including interdependencies such as masking and amplifying effects,
would require a study design of exorbitant scale. Therefore the investigation is restricted
to a selection of QoS-parameters, which is considered most relevant, needful, and feasi
ble. We decided to short-list the QoS-parameter delay, which includes intermedia syn
chronisations as well as roundtrip delays. The reasons for this choice are exposed in the
following:
16 CHAPTER 2. INTRODUCTION
• The timing of interpersonal real-time communication contains important
prosodic (or non-verba~ information about the mind frames of the communi
cating partners. E.g. a bigger than accustomed delay between one partner's
proposal and the other partner's answer leads to misinterpretations (e.g. (1)
the latter would need to think about what was said, (2) would not be certain
of the answer, or (3) would simply have a slow reaction). Thus, timing plays
an important role in appraising individual characters and is therefore consid
ered a key parameter in quality-based network engineering.
• Networked audio-visual communication requires - compared to audio only
- more end-system and network resources, since encoding, transmission,
and decoding of motion images are very data-intensive, requiring either high
network throughput or adequate processing power for compres
sion/decompression in the end-systems (it should be noted that compres
sion/decompression operations usually introduce considerable additional
delay). Thus, there are several interdependencies between throughput, com
pression, and delay. Whereas throughput rates and the extent of compres
sion of underlying network and end-system configurations cannot be di
recdy perceived by the end users, this is not the case for delay. Moreover
beside picture and sound quality - it is the delay of a particular network ser
vice that makes throughput and compression perceivable.
• Valid empirical data concerning perception and acceptance of various delay
parameters in multimodal communication remain sparse. Although a lot of
statements have been made about the upper limit of delay for real-time
communication, most of the values refer either to audio-only, or do simply
reflect the technical limits. An early example are investigations conducted by
Bell Laboratories (Helder, 1966). They were triggered by the introduction of
satellites with their inherendy big delays when high orbits are used. The re
sults from these investigations are not fully convincing, since Bell Laboratories
were probably somewhat biased, as they were certainly not interested in
finding 'killing arguments' against satellite communication.
2.2 Scope of Investigation 17
2.2.2 Published Results for Perception and Acceptance of Delay
The subsequent tables list selective studies concerning perception and acceptance of
relative and absolute delays. There exists a trade-off between these two delay parameters
in terms of the possibility to set the relative delay to zero by buffering the faster stream
(usually audio), and - on the other hand - accepting additional wastage of network and
end-system resources. Thus, in order to optimise the allocation of resources without im
pairing the users' quality perception, it is important to have profound knowledge of the
detection and impairment potential of both delay parameters.
Relative Delay
As mentioned, due to different compression/decompression needs for audio and
video data, the transmission of audio-visual data can result in considerable intermedia
synchronisation errors (referred here to as relative delqy). Table 5 lists some selective fmd
ings about the perception of relative delay for both asynchrony orders: Auditory before
visual (AV), and visual before auditory ryA). The results of the listed studies - except
Steinmetz' findings - showed that AV stimuli were detected easier than the opposite or
der. A further result concerned the type of the presented stimuli: synchronisation errors
of distinct stimuli are detected easier than synchronisation errors of the more complex lip
reading.
Table 5 Excerpt of studies concerning the perception of asynchronies.
Authors Condition... .. [msl \"",. ~(m.1.•• n;'
Lipreading 131 258(Dixon et al., 1980)
Hammer hitting peg 75 189
(McGrath et al., 1985) Drawn moving lips 79 138
(Lewkowicz, 1996) Bouncing disk 65 112
(Pandey et al., 1986)Lipreading with n.a.
Slump in performance
masking noise at 80-120
(Steinmetz, 1996) Lipreading ca. 80 ca. 80
18 CHAPTER 2. INTRODUCTION
The relative delay thresholds do not vary too much, in consideration of the different
experimental designs used in the listed studies. Unfortunately some results of are lacking
from detailed specification of confidence levels. Furthermore, in the scanned literature
no studies were encountered, which provide psychometric functions of the synchronisa
tion errors. In fact, perception thresholds were rarely obtained by means of psychophysi
cal methods.
Absolute Delay
While transmitting multimedia data from one place to another, it is inevitable that a
certain amount of delay is introduced. For pure audio transmissions this delay can be
kept very small, depending on the system architecture, the coding and compression of
the signaL If there is video data in addition, more delay is added because of the bigger
complexity of the video information.
So far a lot of statements have been made about the upper limit of delay for real-time
communication that can be expected of the users. Unfortunately, most of the suggested
values do not reflect the needs of the users but result from technical limits. In Table 6
some selective findings about absolute delay are summed up.
Table 6 Excerpt of studies concerning roundtrip delay.
Authors Condition - co, •• DeI8J>tmsl
(Yamaguchi et al., 1986) audio 360
(Chen et al., 1989) audio 600
(Gonsalves, 1989) audio 400
(Ranta-aho et al., 1998) audio-visual 1400 - 1920
(Alfano, 2000) audio 300 - 800
(Wilkins et al., 1998) audio-visual20 (LAN)
380 (WAN)
(Bouch et al., 2000b) audio-visual 400
(Isaac et al., 1994) audio-visual 640 - 840
2.2 Scope of Investigation 19
2.2.3 APsychophysical Approach
Investigating humans' perceptions of external events is an interdisciplinary undertak
ing involving several branches of study, such as physics, sensory physiology, cognitive
and social psychology, and even cultural anthropology. The inclusion of all branches
would of course go beyond the scope of this thesis. Therefore we will restrict ourselves
to a feasible approach. In our opinion a P!Jchop~sical approach is suitable for the investi
gation of human perception and acceptance of perturbing influences from technologi
cally mediated communication.
The following general definition of p!Jchophysics and its interpretation is offered by
John C. Baird and Elliot Noma (Baird et al., 1978) and used by the International Society ofP!Jchophysics: »Psychophysics is commonly defined as the quantitative branch of the study
of perception, examining the relations between observed stimuli and responses and the
reasons for those relations. This is, however, a very narrow view of the influence it has
had on much of psychology. Since its inception, psychophysics has been based on the as
sumption that the human perceptual system is a measuring instrument yielding results
(experiences, judgments, responses) that may be systematically analysed. Because of its
long history (over 100 years), its experimental methods, data analyses, and models of un
derlying perceptual and cognitive processes have reached a high level of refinement. For
this reason, many techniques originally developed in psychophysics have been used to
unravel problems in learning, memory, attitude measurement, and social psychology. In
addition, scaling and measurement theory have adapted these methods and models to
analyse decision making in contexts entirely divorced from perception.« Hence, accord
ing to this definition, it appears that the term psychophysics is used to denote both the
substantive study of stimulus-response relationships and the methodologies used for this
study.
In Figure 3 a global model (Krueger, 1994) is introduced in which the psychophysical
approach is embedded. The model has been developed as a result of extensive research
concerning deterioration of health and well-b<.:ing, conducted at the Institute ofHygiene and
Applied P~siology (IHA) at the Swiss Federal Institute of Technology (ETH). It offers a means
to explain different user perceptions for objectively same interfering environmental stim
uli. The model is also assumed to be suitable for investigating technologically mediated
communication, since the intermediate technology can be considered an artefact that
evokes perturbing influences.
20 CHAPTER 2. INTRODUCTION
Figure 3 Global model (Krueger, 1994). The model explains the variance of userperceptions observed for objectively same stimuli.
The model in Figure 3 expresses the basic message that psychological effects may not
be disregarded when explaining environmental factors. I.e. the objective world is com
municated to a subjective world of mental constructs. Subjective assessments (attributionf
and affective judgement) are done according to this mental construct system and not to
the objective world directly. As an entrance to the above model the classical stimulus
response relationship measured by means of psychophysical methodology is dia
grammed. This upper layer outlines the topic under investigation, whereas the deeper lay
ers in the model are not subject to this thesis.
2.3 Structure of the Thesis
After having outlined background and aims of this thesis, and after having delineated
its scope comprising of the chosen psychophysical approach for measuring delay thresh
olds, the remaining components of the thesis are now briefly described.
f For which the Attribution Theory offers means of explanations. The Attribution Theory wasfounded by Fritz Heider (1958) and advanced by Harold Kelley (1973), both social psychologists. The theory is seen as relevant - among other things - to the study of event perception. Itdescribes how people explain events and the behavioural and emotional consequences of thoseexplanations.
2.3 Structure of the Thesis 21
Chapter 3 deals with the theoretical background in the fields of communication,
cognitive psychology and psychophysics, which we consider necessary to elucidate. In a
first part of this chapter a taxonomy of communication is developed, which is arranged
in a layered order with higher positioned entities influencing the subjacent ones. Since
diverse - and sometimes contradictory - theories deal with communication we will give
insight in an excerpt, which is considered suitable and sufficient for our purposes. In a
next part some concepts and conceivabilities are presented dealing particularly with
neural processing speed of different modalities, and with mental representations of time
resolved on the neurological level. Subsequently, chapter 3 details the psychophysical
background necessary to understand the procedure of the experiments conducted in
chapter 4.Chapter 4 is subdivided into two parts, each describing procedures and results of ex-
periments conducted in the mode of either human-computer interaction (HeI) or human
human interaction (HHI). The outcomes of these experiments are discussed in chapter 5,
where also concluding remarks are presented with an emphasis on human factors in net
work engineering.
The following annex consists of a description of the software called best-PEST calcula
tor, which has been programmed in order to run the threshold experiments and which
has been advanced to a fully independent, browser-based application aiming to make it
accessible for a broad audience.
A glossary, together with the cited references and an index of keywords concludes the
thesis.
Seite Lee·r /Blank leaf
3
3.1
Theory
In this chapter we give an overoiew of the theoretical background concerning the topic underinvestigation, particularlY the experiments described in chapter 4. First we develop a taxonomy ofcommunication, where a thread through the theories dealing with communication is established. SubsequentlY we give insight in current research ofmental representation of time,and processing speed in different modalities. The last part of this chapter deals with p[Ychophysical theory and the methods applied in the experiments.
ATaxonomy of Communication
Understanding an entity under investigation implies analysing and describing it. When
this entity is too complex models have to be created, which classify objects and concepts.
Of course a model is an approximation of reality; nevertheless it should provide enough
resolution, so that gained insights are reproducible in reality.
The entity under investigation here is communication. Since there is no widely accepted
general taxonomyg of communication we are about to develop one in the sense of a 'co
ordinate system', allowing for a concise description of specific communication settings.
For the users' quality expectations, the communication setting is considered crucial.
Therefore, it is important to avail oneself of appropriate models for such settings.
In the next sections an approach is described, consisting of the five aspects social con
text, orientation, coding, modality, and timing. It is considered suitable for the investigation of
g Taxonomy is the science of the classifying laws. The notion of taxonomy means establishingclasses within a set; classes may form partitions, overlapping, or nested subsets.
24 CHAPTER 3. THEORY
interpersonal communication. It will be shown that these five aspects can - to some ex
tent - be ordered in a layered fashion as shown in Figure 4. The suggested communica
tion layers will be defined and insight in the theoretical provenience will be given. There
after the layered order is exemplified, pointing out the interface between the interpersonal
communication model and the OSI reference model. Parts of this chapter are published in
(Guttormsen Schar et al., 2002).
Technically skilled readers will recognise the interpersonal communication model of
Figure 4 as a variety of the famous 7-layers OSI reference model (description see in the
Glossary). In fact there is resemblance, and the concepts of these two models are not too
far from each other, albeit they cannot be transformed one-to-one. Rather, they belong
to different systems as it is depicted in Figure 4, differentiating between culture, individuals,
and technology. The model of the interpersonal communication can be thought of being
stacked above the OSI-model (technology), and being subordinated by the cultural con
text. Before we take a closer view to the interpersonal communication model we will
shortly explain the other two systems, which are not in the focus of our investigation.
With OSI's technological approach, control is passed from one layer to the next. A
communication begins with the application layer on one end (for example, a user work
ing with a videoconference (VC) application). The information is passed through each of
the seven layers down to the physical layer (which is the actual transmission of bits). On
the receiving end, control passes back up the hierarchy.
In the system culture, we are distinguishing between individualistic and collectivistic
cultures. Individualism holds that the individual is the primary unit of reality and the ulti
mate standard of value. This view does not deny that societies exist or that people benefit
from living in them, but it sees society as a collection of individuals, not something over
and above them. Collectivism holds that the group - the nation, the community, the race,
etc. - is the primary unit of reality and the ultimate standard of value. This view does not
deny the reality of the individual. But ultimately, collectivism holds that the groups one
interacts with determine one's identity.
3.1 ATaxonomy of Communication 25
Interpersonal Real-Time CommunicationApplication
Social Context IFormalInformalI
lIPerson
Non-PersonIOrientation
lIVerbal
Non-VerbalICoding
• IVisualAuditoryIModality
l ISynchronousTiming Le I
Figure 4 Outline of communication taxonomy in a layered fashion with typicalexamples. Attributes bounded by dashed lines are not in the focus of our investigation.Reading example: communication settings between people take place either in formal orinformal context (Short et aI., 1976). The transmitted information concerns the relationbetween the partners (person-oriented) or the content (non-person-oriented(Watzlawick et aI., 1967). It is encoded verbally or non-verbally, and is received with theaid of either the visual or the auditory sense organs. The timing decides - among others- which applications take place in the considered communication setting.
26
3.1.1 Social context
CHAPTER 3. THEORY
The higher layers in the communication model (i.e. social context and orientation)
comprise a rather broad range of features. As a consequence, the study of higher-level
aspects of communication has been the source of many different theories. In what fol
lows, a thread is established through some of these concepts. It should be noted that
most of these approaches were deftned for business communication, probably because
this area is regarded as most influential when quality requirements are established.
In the following we describe social context according to the aspect degree rifformality, i.e.
depending on how far there exist formal rules or some codex for the exchange of
information.
Formal and informal communication
Several theorists make the distinction between formal and informal communication.
Smith (1972) deftnes formal communication channels as »those emanating from official
sources and carrying offtcial sanctions [...]. Formal messages usually flow through these
channels, thus acquiring legitimacy and authenticity«. On the other hand, informal com
munication channels »are not specifted rationally. They develop through accidents of spa
tial arrangement, through friendships«. Both formal and informal communication can fol
low an up-ward, downward, or horizontal path (to higher, lower, or equal authority). The
purposes of formal communication are to command, to instruct, and to ftnalise matters
through the application of regulations. The purposes of informal communication are to
educate through information sharing, to motivate through personal contacts, and to re
solve conflicts through participation and friendship. It seeks to involve participants in
organizational matters as a means of maintaining their enthusiasm, loyalty, and commit
ment. Table 7 lists some characteristics of formal and informal communication.
3.1 A Taxonomy of Communication
Table 7 Characteristics of formal and informal communication.
27
......... ,.. co .// . 0;:'/rf/ Br; /ii
• v ...... -" "" I·-~'.I ... ..official, binding unofficial
precise, unlikely to be misunderstood personal, inaccurate
traceable, can be preserved hard to trace
can avoid embarrassment can refute rumours and gossips
restricted jargon, rigid more emotional
authoritarian, likely to be obeyed less intimidating
fails to motivate promotes disclosure of underlying motives
3.1.2 Orientation
Orientation describes to some extent the purpose of a communication setting and the
related view the participants should have about the tacit assumptions needed to under
stand the topics being discussed. In some cases, these assumptions might be very limited
in range (e.g. comprising some technical knowledge needed to solve a specific task), while
in other cases, a common worldview is necessary for a fruitful discussion.
The summary below shows theories and methods that can be attributed to the orienta
tion layer. It is necessarily far from comprehensive; it should rather be seen as an indica
tion that it is extremely important to define a certain experimental setting properly, using
terminology and insight gained from the cited theories.
The Bales Categories
Already in the fifties Bales (1955) ran a series of experiments in which subjects held
simulated meetings. He analysed the nature of the interactions that took place. From
these experiments he elaborated four main categories: positive reactions, negative reactions,
problem-solving attempts and questions. At the Communication Studies Group (CSG), Short
(1976) reduced these four categories to two: Bales' positive and negative reactions were
classed as person-oriented, and the two other categories, problem-solving attempts and
questions were classed as non-person-oriented. The CSG considers person-orientation to be
the core category in understanding communication mediated by teleconferencing.
28 CHAPTER 3. THEORY
A further step in developing classifying schema was the SYMLOGh-space. Elaborated
from the large amount of research conducted by Bales (1999) this approach indicates at
least three bipolar characteristics, that are fundamental to describe communications in
small groups. The three dimensions spanning the SYMLOG-space are:
• Dominance versus Submissiveness
• Friendliness versus Unfriendliness
• Acceptance versus Non-acceptance rfAuthority.
These characteristics were implemented in standardised questionnaires and were al
ready applied innumerable times in different cultures. Thus, SYMLOG is supposed to as
sess person-oriented parts of small group communications reliably.
The distinction between content and relationship
Watzlawick (1966) distinguishes between the content and the relation part of a message,
thus establishing a direct link to the terms report and command introduced by Bateson
(Ruesch et al., 1951). Watzlawick (1967) apdy points to the correspondence of these
terms to the computer science terms data and control Since control information specifies
what is to be made with the data at hand, it can be regarded as 'information on informa
tion', i.e. metainformation. The following axiom describes this insight: »Every communica
tion has a content and a relationship aspect such that the latter classifies the former and
is therefore a metacommunication.« (Watzlawick et aI., 1967). In the case of interpersonal
communication, the exchange of control information could be seen as 'downloading ap
plets' to be executed by the communicating persons. In that sense, Watzlawick's view is
very near the categories established by Bales, and we can possibly simplify our taxonomy
by understanding a content-oriented approach to be non-person oriented, and, on the other
hand, relation to be person oriented.
How can one express person-, and non-person-oriented information? The answer to
this question leads to the section 3.1.3, where coding is discussed. But before, we take a
closer view of the orientation layer introducing the distinction of implicit and explicit in
formation types. Both person and non-person oriented information can be of implicit and
explicit nature respectively, and they can be expressed by both verbal and non-verbal cod-
h SYMLOG is the acronym for SYstematic Multiple Level Observation of Groups.
3.1 ATaxonomy of Communication 29
ing. And more precisely, there are no statements conveying solely implicit information,
there needs always to be the 'carrying' explicit information too. But in contrast to this,
there are statements conveying solely explicit information. An example will illustrate this
fact: »Joe drank ten beers last night« is a statement, which is by itself explicit and univo
cal. But depending on the context and the sound of voice saying it, it can be understood
in the way of »Incredible how much alcohol Joe always drinks!« which is the implicit
meaning. For the sake of simplicity, when we speak of implicit information we mean
both implicit and the carrying explicit information.
In our approach there is no sharp distinction between implicit and explicit information
types. We differentiate the two by means of the degree rfambiguity, i.e. implicit information
is strongly ambiguous, whereas explicit information is slighdy ambiguous or not at all.
The use of the term ambiguity likewise implies that the communicating partners have
mutual and tacit assumptions about the rules of their information exchangei . This means
that partners should agree on, and be aware of the 'rules' of ambiguityi; fulftlment of this
requirement is - among others - a job of the education system, imparting verbal and cul
turalliteracy. In order to clarify the distinction between implicit and explicit information
types see Table 8 (page 33) where some examples are depicted.
3.1.3 Coding
Information can be encoded and transmitted in many different ways, and different
forms of communication with specific codes may be used concurrendy. A voice signal
can conceptually be decomposed into a verbal part and a non-verbal part (or so-called prosodic information, like pitch, melody, level and timing). On the one hand, non-verbal fea
tures of both visual and auditory modalities convey information allowing to interpret a
message properly (e.g. to differentiate between a question and an exclamation), or allow
ing to make the speaking person out (e.g. by means of moving lips); on the other hand,
they help us in identifying a known person or to guess about his/her state of mind.
i otherwise - as a consequence - they would have to accept an impaired conversation (as it mighthappen between partners speaking different languages), or they might have to oversimplify thetopic.j we are not considering the particular meanings of an ambiguous statement to be vague, unclear,or obscure - far from it - they are very precise and clear; ambiguity is caused by the fact that oneis not sure which of the meanings can be accepted for true.
30 CHAPTER 3. THEORY
In addition to this, we should consider the different recognition capabilities of differ
ent codingk • Weidenmann (1988) points out this aspect referring to the learning charac
teristics of different media types. He states that - when choosing the appropriate learning
media - our different familiarity in handling words and pictures comes into play.
Whereas linguistic skills like reading and writing (i.e. verbal skills) are systematically
trained in our educational system, the competence of handling instructional pictures (i.e.
non-verbal skills) needs to be developed. As far as we can see, the two outstanding at
tributes on the coding level are the verbal and the non-verbal information types, as de
scribed below.
Verbal Coding
We speak of verbal coding when written and/or spoken languages and/or numbers are
used, and when mechanisms (i.e. grammars or lexica) exist through which the correctness
of a text or utterance and its meaning can be determined. Thus, verbal coding can itself
be seen at different levels, as shown in Figure 5. It should be noted that the higher we
move up in Figure 5, the more it becomes difficult to set up grammars and lexica as a
formal and comprehensive basis. In fact, the complete interpretation of text and speech
is partly dependent on the semantic level, and to some extent also on the pragmatic level,
which is in addition represented by the orientation and the social context layers as discussed
in sections 3.1.2 and 3.1.1 (page 27).
k Note that we are considering external coding here, unlike internal coding which is used in cognitive psychology with emphasis on the mental coding of input and the resulting information processmg.
3.1 A Taxonomy of Communication
/ Pragmatics Derivation of actions fromthe meaning.
Semiotics SemanticsGenerally acceptedmeaning of words,sentences and texts.
~ SyntaxRules by which words arecombined to makesentences and texts.
~ Rules by which signs arecombined to make words.
Figure 5 A layered view of verbal information. Note that the attributes given onthe right side are typical examples.
Non-verbal Coding
31
Non-verbal coding is often associated with many different kinds of pictorial representa
tion (e.g. gestures and facial expressions conveyed through video, graphs and pictures,
animations, pictograms, icons, etc.). Also, as already described, the prosodic features con
tained in a speech signal represent non-verbally coded information, as well as any non
verbal sound, for example instrumental music. Furthermore, it should be noted that vari
ous forms of background information (both visual and auditory) might supply important
information about the context of a communication session. I 'or example, hearing the
background noise of a rail-way station makes a phone call mure credible when the sub
ject of the call is about train delays - or even more when seeing the cabin in the back
ground if using e.g. a UMTS device.
3.1.4 Modality
One of the most basic conditions for participation in any communication event is the
sensation and the perception of the transmitted signals conveying information. This in
volves the human sense organs, which are able to detect light, sound, smell, taste, touch
and position, each corresponding to one specific mode (often, the term channel is used
alternatively). Although future developments in telecommunications might bring the in-
32 CHAPTER 3. THEORY
troduction of olfactive (smell) and haptic (touch) modes in special contexts (e.g. telesur
gery), we will restrict ourselves for the time being to the auditory and visual channels.
Multimodal Communication: Audio-Visual
Whenever the auditory and the visual channels are simultaneously invoked in a com
munication setting, we normally speak of multimedia communication. Increasingly, the alter
native term multimodal communication is used, where 'multi' does not just imply 'sound and
vision', but the fact that several different forms of communication (in the sense of sub
modes) can be implemented within both the auditory and visual channels. For example,
the visual channel is involved when a video signal represents a 'head and shoulder' pic
ture of the communication partners; alternatively, it is used as well when text and graphi
cal information are exchanged in shared workspace applications. The latter application
usually comprises still another supporting communication mode in the form of a separate
channel linking a mouse or a joystick simultaneously with a local and a remote pointer.
These examples belong to the coding layer in our communication model, since the sub
modes differentiate themselves through different forms of coding.
3.1 A Taxonomy of Communication
Table 8 Exemplification of the three layers orientation, coding and modality (including implicit/explicit distinction). The implicit messages are made explicit in the onfy explicitcolumn (in the verbal-coding row only). Note that the distinction between implicit and explicit is made by means of the degree ifambiguity.
33
Person oriented . ... ..:"". .~. ...
Implicit and explicit Only explicit Implicit and explicit Only explicit
Reading 'between Written text with rela- Reading 'between the Written, task-relatedthe lines' some rela- tional information. lines' some task-related text.
ro tional information. information.;:,en e.g. »Big parts of his at- e.g. »We will have to lay5
e.g. »Mr. Miller has tendance time Mr. Miller e.g. »Maintaining job se- off workers next month.«.gJextraordinary interper- was chatting with his col- curity will be abig chal-
1 sonal skills.« leagues.« lenge in near future.«
~ Hearing 'between Spoken text with rela- Hearing 'between the Spoken, task-related
~. the lines' some rela- tional information. lines' some task-related text.~ tional information. information..s:.0 e.g. »1 doubt about your e.g. »The Porsche engine;:,<{ e.g. »Are you sure of competence.« e.g. »The Porsche engine has to be reengineered.«
what you are talking still uses traditional injec-about?« tion.«
Extracting relational Gazes, gestures, im- Extracting task-related Task-related gestures,information from ages, emoticons [e.g. information from gazes, images, icons [e.g. ~,
ro gazes, gestures, :-( or ©] etc. with re- gestures, images, etc. ){] etc.;:, images, etc. lational information.en5
e.g. showing apicture ofCD e.g. configuration of linesc e.g. avoiding eye con- e.g. the referee showing indicating a3D-cube adefect of an aeroplane,;gCo) tact the yellow card
.......
~ Extracting relational Pitch, volume, etc. of Extracting task-related Task-related pitch, vol-1" information from voice and sounds with information from pitch, ume, etc. of voice andc pitch, volume, stac- relational information. volume, etc. of voice sounds.0z ~ cato, etc. of voice and sounds..s
:.0 and sounds. e.g. the sound of abeep;:, e.g. hooting, cheering<{
e.g. rattle noise from a instead of acensorede.g. talking with higher vehicle word in aspoken sen-pitch to someone tence
34
3.1.5 Timing
CHAPTER 3. THEORY
As can be seen in Figure 4 we distinguish between synchronous and asynchronous
interactions1. Simplifying things we note that the main difference between the two timing
categories is the time magnitude of the interactions between the communicating partners.
Whereas synchronous interaction is in the range of milliseconds to seconds, asynchro
nous interaction is in the range of minutes to hours, or even days to weeks. Since we re
strict the model to synchronous and asynchronous timing, we implicidy restrict ourselves
to dialog or interactive communication.
According to Fluckiger (1995) the timing of interaction decides which applications
take place in the considered communication setting. Examples for synchronous or real
time interaction are:
• Interpersonal applications: Only two individuals are involved. Also called person
to-person applications, and sometimes called one-to-one applications.
• Distribution apph'cations: Sometimes called person-to-group applications, where
multimedia information such as a live audio and video is transmitted from
one source to multiple recipients in a one-way mode (no return channel
from the recipient to the source). This is analogous to 1V broadcasting.
• Group teleconferencing: Sometimes also called group-to-group teleconferencing,
which is a generic term referring to bi-directional conversational communi
cation between two or more groups of people.
Examples for asynchronous interaction are:
• Multimedia e-mail: This is the conventional e-mail where the documents ex
changed are not only plain text, but also include rich text, hyperlinks, and
audio or video sequences.
• A!)nchronous computer conferencing: Refers to a service where people exchange
multimedia messages asynchronously. The technique often consists of sub
mitting or retrieving contributions to or from centralised servers.
1We define a basic interaction unit as one reciprocal action, consisting of an action triggered by asource, echoed by a sink, and received by the source again.
3.1 ATaxonomy of Communication 35
Since asynchronous communication is not in the focus of our investigations, we will
only consider synchronous (i.e. real-time) settings in the following. Within these real-time
settings we focus on network-mediated interpersonal communication as well as on peo
ple-to-systems communication.
Absolute Delay
The main issue of our investigation of real-time communications concerns absolute de
Icry, which is only perceivable in dialog settings. Strictly speaking, also typical one-way ap
plications like e.g. video-on-demand have a dialog part, namely between sending the re
quest and receiving the video stream. This means that also one-way applications let the
user perceive absolute delcrys in an initial phase, but as soon as the connection is established
the user is not aware of absolute delays anymore, so that the term one-wcry for such kind
of applications is justifiable.
In Figure 6A we depict the definition of the absolute delcry in the way users of real-time
dialog applications are aware of. In the same figure there is also the technologically in
spired definition of the term round-trip delcry, which is sometimes used synonymous. We
define absolute delcry as the elapsed time between the expression of an auditory, visual or
tactile trigger and the answer from a communicating partner (human or machine). I.e.,
the acting user at the source sets a primary internal time marker when executing an ex
pression, and a secondary one when perceiving the answer. The estimation of the elapsed
time between these two markers is what the acting user perceives as absolute delay. It
remains for the time being an open question, whether the acting user sets the time
marker at the time he/she perceives his/her own expression, or at the time he/she is
planning to produce it. The absolute delay consists of:
• two network transit delays (hi-directional)
• two times the depacketising delay
• the source encoding and decoding
• the sink echo processing
• the neural transit delay of the user at the source receiving the answer
In Figure 6B we depict a magnified view of the sink echo processing consisting of:
• the sink encoding and decoding
36 CHAPTER 3. THEORY
• of the reacting user's reaction time
On its part the reaction time consists of:
• the neural transit delay between peripheral excitation and conscious percep
tion
• the cognitive processing time
• the time needed to produce and execute the output stimulus
Depending on the point of view, a particular user in a real-time dialog setting has both
roles: for oneself that of the source and for the partner that of the sink. Hence, when
perceiving the absolute delay the user does - beside the technologically generated delays
- estimate the reaction time of the partner, but not the own reaction time. I.e., the per
ceived and estimated source echo processing time is not equal to the sink echo process
ing time.
tA
Neural transit t Bdelay of acting
SUbject
Stimulusproduction
Cognitiv processing time
Consciousperception of
reacting subject
1
1 Neural transitdelay
I I 1
I!I-Ollll(""'--- Reaction time of reacting subject--...
01" Absolute Delay ---------~.-JI~ I~ I \J-li·.-----I-::.~=-I--Round-trip delay '-1 -co III~ 1(/)1 .c.51 I 1"1 .21~o I First bit I First bit II~ I Last bit First bit First bit 0.\ U)
~ Itransmittedl received ~ Ireceived transmitted received B I~o I by source I by sink 10ID by sink by sink by source ~ lit;51 a.. ca
'Ci) Sink echo processing IU)~~__
KII Source INetwork transit I I Sink I Sink INetwork transit l>< I coding I delay 1 I Idecodin I coding I delay 2 IWI
Figure 6 A: Schematic diagram of the round-trip delay according to Fluckiger(1995), and of the absolute delay according to our definition. Grey shaded areas indicatehuman information processing time. B: Magnified view of the reacting subject's reactiontime, and the neural transit delay of the acting subject.
3.1 ATaxonomy of Communication
Relative Delay
37
In contrast to absolute delay, relative delqy is perceivable also in one-way settings. We
define relative delqy as the time difference between the appearance of the visual stimulus
and the appearance of its appendant auditory stimulus in an audio-visual presentation.
Furthermore we distinguish between the possible orders of the incoming stimuli: Audi
tory precedes visual (AV), visual precedes auditory ryA), or they are in sync, i.e. there is
no relative delay. Relative delay is sometimes referred to as intermedia .rynchronisation, or lip
.rynchronisation, pointing to the particular synchronisation requirements needed either to
give the feeling of naturalness in audio-visual telecommunications, or to enable or en
hance lip reading for hearing impaired people (e.g. to optimise hearing aids). These areas
comprise a rich body of literature as e.g. (McGrath et al., 1985; Pandey et al., 1986;
Summerfield, 1992; Kouvelas et al., 1996; Steinmetz, 1996; Stone et al., 1999; Oviatt et
al., 2000; Stone et aI., 2001; Van Hoesel et al., 2002), which in fact rarely treats perception
thresholds obtained by means of psychophysical methods. Further studies investigated
the intermedia synchronisation by means of distinct stimuli like bouncing disks
(Lewkowicz, 1996), or hammer hitting a peg (Dixon et al., 1980). (See also section 2.2.2
Published Results for Perception and Acceptance of Delay).
3.1.6 Exemplification of the interpersonal communication model
In the following we will explain the interpersonal communication model by means of a
videoconference ryC) user. Furthermore we will point out the interface between the OSI
and the interpersonal communication model, when we consider the interaction timing of
different applications.
Before the VC user will start sending information through the videoconference appli
cation, s/he will be aware of the social context, in which the communication setting will
take place. In our approach, this means that s/he knows if the communication partner
belongs e.g. to the family, to the workmates, to the circle of friends, or to the circle of
acquaintances etc. Consequendy s/he has also an idea of the hierarchical position of the
communicating partner, of the overall importance of the event and the like. We subsume
these factors, saying that the user is aware of the degree rif the formality of the event. Fur
thermore we predicate that the degree of formality determines the communication proc-
38 CHAPTER 3. THEORY
ess to come. That is, the communicating partners will choose an appropriate languagern
as well as modify the topics of the conversation, voice pitch, gestures, gazes, and interac
tion timing. When we consider these modified aspects separately, deeper layers in the
communication model will be probed.
As already stated before, communication between people comprehends content in
formation, including also the problem or purpose, and metainformation, i.e. implicit in
formation concerning the intended meaning of the verbal, usually ambiguous content.
Metainformation usually uncovers in which relation the partners are, and is therefore
considered as person-oriented information, unlike the non-person-oriented information, which is
the 'real' content. However, the orientation of the information stream to be sent to the
communication partner is the next crossway, where the VC user has to pass by. Accord
ing to her/his appraisement of the actual communication setting (which also includes the
problem to solve), s/he will direct the information flow more towards the partner or
more towards the task. And s/he will choose a more explicit or a more implicit way to
express her/his message. Again, this influences the following layers.
In order to illustrate how the higher layers influence the coding of a message, let us
assume two examples for the use of a videoconference, which both are in a formal con
text:
• Two industrial designers are working on improving the ergonomics ifa drilling machine.
• Superior and emplqyee are talking about the emplqyeejpersonalperformance.
First of all these examples show that the choice of verbal and non-verbal coding respec
tively is determined by the problem to solve. The designers primarily will choose sketches
and schemes to solve the problem, whereas the superior will talk to the employee before
writing a letter of reference. Thus it appears that the purpose determines the orientation
of the conversation, and furthermore the suitable coding: The coding is mainly non
verbal in the case of the (non-person-oriented) designers, and is mainly verbal in the case
of the (person-oriented) superior. The chosen examples are not inevitably typical exam
ples, there are probably more examples proving the contrary, e.g. using non-verbal cod
ing in person-oriented communication and using verbal coding in non-person-oriented
communication. Hence it appears that these examples do not imply rules for the use of
m For instance, they restrict their vocabulary, if they consider themselves in a formal conversation.
3.1 ATaxonomy of Communication 39
the particular coding. They only exemplify the layered order of our modeL We are sug
gesting that depending on a particular communication setting, there are tacit agreements
about the accepted and optimal manner to encode the message. Referring to the two ex
amples this means that it is probably not helpful to use to a great extent spoken and writ
ten language in order to improve the ergonomics of a drilling machine. And it is unusual
and probably not accepted by the employee being qualified only by charts and diagrams;
a personal word is expected here.
Actually the next entity modality needs not to be underneath the coding, in terms of
being determined by it. But it makes sense when we consider the degree of conscious
ness, which is necessary either to perceive or to decode a message. Perceiving visual and
auditory information is handled by the sense organs, their corresponding neurological
pathways and by the visual and auditory cortex, whereas decoding verbal and non-verbal
information involves higher levels of information processing and consciousness. In
short: An amoeba is capable of detecting light, but will fail to extract abstract information
from a visual pattern.
Considering the relative delcry of an audio-visual event, where sound is preceding the
visual component, we are leaving the 'natural' frame of reference: in a natural environ
ment there is no sound preceding the corresponding visual event, whereas the contrary
situation - sound is lagging the corresponding visual component - is familiar to every
one, e.g. seeing first a hammer hitting a peg before hearing the knock. A comparable rea
soning can be followed in regard to absolute delcry: audio-visual communication in natural
environments creates no bigger transit delays than sound needs to travel through the
range of vision, whereas in technologically mediated communication this delay can be
theoretically of any value above a minimal delay due to physical constraints. Recapitulat
ing, it appears that fundamental characteristics of the timing layer concerning order, or
asynchrony are not found in face-to-face communication. In contrast to that, all charac
teristics of the higher layers in our communication model are found - together with
technologically mediated communication - also in face-to-face communication. This fact
predestines the timing to be the most basic layer, representing the interface to the system
technology, which, on its part, is instantiated by the OSI-modeL
The basic layers timing, modality and coding of the communication model in Figure 4 are
mainly of elementary nature. They can be regarded as absolute prerequisites for any
communication between people. Procedures for the investigation of these layers are ex
pected to be manageable. This is not the case for the upper layers orientation and social
40 CHAPTER 3. THEORY
context, where many diverse situations are conceivable, usually very much depending on
the nature of the tasks performed. Moreover, at these layers psychological characteristics
of the involved persons will play an important role; thus, the character of the involved
people, and - after all - group dynamics may have to be taken into account when design
ing experiments or interpreting their results.
It has been found that the investigation of most multi-participant (dialog or conversa
tion type) settings is a move into 'terra incognita', i.e. generally accepted research ap
proaches do not exist, and most often there is a lack of methods, taxonomies and even
proper definitions of the entities under investigation. This is especially true for psycho
physics, where traditionally many problems associated with 'one-way' situations were in
vestigated (humans as stimulus receivers), and where, on the other hand, research
concerning dialog settings appears to be extremely sparse.
3.2 Processing Time of Auditory and Visual Stimuli
When we consider the perception of events conveying coexistent information of dif
ferent modalities, such as - in our case - auditory and visual, we have to take into ac
count that different receptors and perceptual pathways are involved for different modali
ties. Therefore it is obvious taking into account the possibility of different processing
times in different modalities. In fact there are differences. In the following we will pre
sent two ways of determining them: indirectly through differences in reaction time for
different modalities, and directly trough measurement of Event Related Potentials (ERP).
3.2.1 Indirect: Reaction Time Differences
Reaction time has been a favourite subject of experimental psychologists since the
middle of the nineteenth century. Thereby three basic kinds of reaction time experiments
have been conducted.
• Simple reaction time experiments
• Recognition reaction time experiments and
• Choice reaction time experiments
3.2 Processing Time of Auditory and Visual Stimuli 41
In simple reaction time experiments, there is only one stimulus and one response. If the
stimulus appears the response is required as fast as possible. In recognition reaction time
experiments, there are some stimuli that should be responded to and others that should
be ignored. And in choice reaction time experiments, the experimental subject must give a re
sponse that corresponds to the stimulus, such as pressing a key corresponding to a letter
if the letter appears on the screen.
Since the beginning of the reaction time research, many researchers have confirmed
that reaction to sound is faster than reaction to light. The accepted figures for mean sim
ple reaction times for college-age individuals are about 190 ms for visual stimuli and
about 160 ms for auditory stimuli (Galton, 1899; Fieandt et al., 1956; Brebner et al., 1980;
Welford, 1980). Differences in reaction time between these types of stimuli persist
whether the subject is asked to make a simple response or a complex response (Sanders,
1998). The time for motor preparation (e.g., tensing muscles) and motor response is the
same in all three types of reaction time test, implying that the differences in reaction time
are due to processing time (Miller et al., 2001).
Hence, there is evidence from reaction time experiments that the mean processing
time of auditory stimuli is about 30 ms shorter than the mean processing time of visual
stimuli. On the other hand there is also evidence, that processing speeds are not fixed
values, rather they are influenced by various forms of facilitation effects: The difference
between reaction time to visual and auditory stimuli can be eliminated if a sufficiently
high visual stimulus intensity is used (Kohfeld, 1971). Cross-modal facilitation can be
demonstrated with experiments showing that reaction time to multimodal inputs pre
sented in close spatial and temporal proximity are typically faster and more accurate than
those made to the unimodal stimuli alone (Hershenson, 1962; Welch et al., 1986; Giard et
aI., 1999; McDonald et al., 2000).
3.2.2 Direct: Event-Related Potentials (ERPs)
Electroencephalography (EEG) provides a direct and non-invasive technique to di
rectly measure processing speed of different modalities: Embedded within EEG signals
are short-term transient waves known as Event-Related Potentials (BRPs). These waveforms
reflect the singular experience associated with an external stimulus such as an auditory or
visual event.
42 CHAPTER 3. THEORY
When a stimulus is presented to a subject, and brain activity is recorded following the
presentation of the stimulus, an ERP can be recorded. I.e. the voltage fluctuations re
corded at the surface of the scalp contain elements specific to the presented stimulus.
Typically, ERPs are largely contaminated by other activities of the brain. By averaging
across several tens or hundreds of trials, individual ERPs become apparent. A specific
ERP becomes evident by adding a series of individual EEG samples time-locked to the
evoking stimulus. By summing these samples, the background brain activity, which is as
sumed to vary randomly over time, will tend to average out.
Accepted figures of visual processing time derived from ERP-studies are between 45
ms and 55 ms as represented by the onset of the earliest cortical potential (Clark et al.,
1995; Clark et al., 1996; Foxe et aI., 2002). On the other hand, the earliest auditory
evoked potential reaches the cortex between 9 ms and 15 ms (Celesia et al., 1971;
Vaughan et aI., 1988), or in less than half the time of visual input, approximately 30 - 40
ms earlier than the visual stimulus. The consequences of different processing times are
that asynchronies of audio-visual events are perceived differendy in respect of the stimu
lus order: Same relative delays for both incoming modality orders would evoke a bigger
perceived delay when auditory precedes visual, than in the opposite order (see Figure 7).
However, this effect might be compensated by recendy discovered fmdings: two stud
ies (Giard et al., 1999; Molholm et al., 2002), which investigated the integration of audio
visual (AV) information by means of ERPs, showed an early AV effect after 46 ms over
the right parieto-occipital scalp. This finding suggests that the auditory part of AV-inputs
modifies early visual sensory processing and leads to the following interpretation: Firsdy,
auditory input activates primary auditory cortex (A1) within 15 ms after stimulus presen
tation and is then transmitted up the auditory processing stream. This input is then pro
jected to visual areas. The critical issue is one of timing. The question is whether there is
sufficient time for auditory input to reach early visual areas to result in modulation of the
later arriving visual input. Given the above mentioned processing times between the ini
tial auditory and visual inputs to their respective primary cortices, there is a window of 25
ms - 30 ms in which the auditory evoked process can prepare visual areas for arriving
visual evoked processes.
3.3 Mental Representation of Time
I i
'!i-ooII1III(f-----perceived relative delay AV----.....;..~II
1l1li( .. :
Iperceived relative delay VAI
43
o 20 40 60 80 100 120 t(ms)
Figure 7 Effect of the different processing times of auditory and visual stimuli inthe human brain disregarding the early AV effects described by Molholm (2002). Greyshaded areas indicate processing time for both modalities.
3.3 Mental Representation ofTime
Synchronous interaction is immediate. Knowing from real-life situations, the term imme
diate is used with considerable tolerance. In some situations the reaction of a request
should be as fast as possible, whereas other situations allow for a reaction after a certain
delay, e.g. after a commenced workstep is accomplished. Anyway, whenever an immedi
ate reaction is required, it is expected to be executed now. Therefore the term now - which
means the present - is afflicted with big tolerance too.
This real-life experience has an analogy in the r hilosophical discourse: If one argues
on an abstract level, the present can be considered as the dimensionless border between
the past and the future, thus the present does not last since it is a timeless cut-off point.
On the other hand, we know by experience that the present has a certain duration, i.e. we
are aware of the present and we can easily distinguish between what is now, what has
been before and what is still to come. Otherwise we would be riven between past and fu
ture. This discrepancy between experience and theory represents a profound problem,
and philosophers were dealing with it since antiquity (for some examples see Poppel
(1997a». Since we focus on phenomenological reality, we are not treating the abstract
44 CHAPTER 3. THEORY
. connotation of the present, but the experienced 'nowness', which is called su,?jective present
(Stern, 1897; Poppel, 1978).
3.3.1 Low Frequency Processing
Given that suijective present is experienced as a certain amount of time, how can it be
determined then? Poppel (1997a) describes some experiments dealing with the duration
of subjective time. In the following we give an excerpt of these experiments, concerning
the visual and the auditory modality.
Figure 8 shows an ambiguous line drawing, named after its founder Louis Albert
N ecker. It is a wire-frame drawing of a cube in isometric perspective, which means that
parallel edges of the cube are drawn as parallel lines in the picture. When two lines cross,
the picture does not show which is in front and which is behind. This makes the picture
ambiguous, i.e. it can be interpreted in two different ways. When a person stares at the
picture, it will often seem to flip back and forth between the two valid interpretations. In
order to reproduce what follows, it is helpful making us familiar with both perspectives.
The black spot in the corner of the cube in Figure 8 is an aid to envision the two per
spectives: in one perspective it is in the foreground of the cube, in the other it is in the
background. After we are capable of swapping deliberately between the two perspectives,
an experiment can be conducted demonstrating the scope of the human time integration
capability: We stare at the cube and try to hold one perspective as long as possible. What
happens then is, that after a few seconds the perspective swaps automatically. Now we
try to hold the swapped perspective as long as possible. We will notice that once again af
ter some seconds the cube swaps against our wishes. A possibility to overcome the
cube's forced swapping is staring at an arbitrary point of the cube and trying to think at
something different. As a result, the cube remains stable, because we have banned it from
conSCiousness.
3.3 Mental Representation of Time
Figure 8 The Necker Cube is an optical illusion first published in 1832 by theSwiss crystallographer Louis Albert Necker. It offers a means to estimate the durationof the subjective present.
45
The spontaneous alteration of ambiguous figures is an effect that is observed also in
the auditory modality. A similar experiment can also be conducted interpreting e.g. the
ambiguous phoneme sequence CU - BA - CU - .... For some seconds one hears BACU
whereupon for another couple of seconds one hears CUBA (poppel, 1997b). Such spon
taneous alteration rate in the two modalities suggests that a low-frequency mechanism
binds successive events of up to 3 s (poppel, 1994) into perceptual units. After this pe
riod attentional mechanisms are elicited that open sensory channels for new information;
if the physical stimulus remains the same, the alternative interpretation of the stimulus
will gain control. Metaphorically, up to every 3 s the brain scans the sensory inputs and
asks: »what is new?«
Evidence for the 3-seconds-hypothesis is also supplied by experiments using other
paradigms. Studies on the temporal reproduction of stimuli with different duration show
that stimuli are reproduced almost truthfully up to 3 s. Longer stimuli are reproduced
significandy shorter and with much greater variability (see Figure 9). Intervals of up to 3 s
can be mentally preserved, or grasped as a unit, whereas longer stimuli are likely to be
squeezed into the 3 s interval.
46 CHAPTER 3. THEORY
723456Duration of stimulus (s)
1o+-....,....~_r___r_.......,...___r___r--..,.__----.r_.._..___...___.
o
7
6
-en 5--Q)enc0
4Q.en~-0c 3 _.----------_ ..._---0
:0::;m.....~ 20
1
Figure 9 Example for the reproduction of temporal stimuli between 0.5 and 7 sduration from one subject. Stimuli were given in random order. A continuous light wasused as stimulus. At S=R, stimulus duration equals reproduction. ALth is the geometricmean of all stimulus durations. Note that for stimuli longer than 3 s temporal reproduction remains short. Data from Poppel (1971).
3.3.2 High Frequency Processing
Evidence for a high-frequency processing system comes, in part, from studies on tem
poral order thresholds (Hirsh et al., 1961; von Steinbiiche1 et al., 1996). If the temporal
order of two stimuli has to be indicated by experimental subjects, independent of sensory
modality, a threshold of 30 ms is observed. Data picked up within 30 ms are treated as
co-temporal, that is, a relationship between separate stimuli with respect to the before
after dimension can no longer be established. This does not mean that the central nerv
ous system cannot process information for shorter intervals than 30 ms (e.g. the localisa
tion of objects in auditory space requires a much higher temporal resolution. For detailed
explanations concerning microsecond timing, see section 3.4.1), however, distinct events
require a minimum of 30 ms to be perceived as successive.
3.3 Mental Representation of Time 47
Support for distinct system states come from a variety of studies using different para
digms: Under stationary conditions response distributions of reaction time Ookeit, 1990),
or pursuit eye movements (poppel, 1986) show typical characteristics in the sense that
frequencies of preferred response latencies are separated approximately by the 30 ms in
terval (see Figure 10). These effects can be explained on the basis of neuronal oscilla
tions. After the transduction of a stimulus, an oscillation of 30 ms is initiated that is
phase-locked to the stimulus. Such an oscillatory mechanism, under environmental
stimulus control, allows integration of information from different sensory modalities, i.e.,
data from various inputs can be collected within one period, which defines a basic system
state. The separate response modes possibly represent similar successive and discrete de
cision-making stages, as is assumed in high-speed short-term memory scanning
(Sternberg, 1966).
! ! ! !!
2
12
10
8Cl)IDCl)
56c..Cl)
~~4o
O-+-....-........-'l'~~
o 50 100 150 20(' 250 300 350Latency (ms)
Figure 10 Histogram of 463 latencies of pursuit eye movements in three subjects.Data are summarised in 10 ms bins. Arrows indicate temporal positions of the preferredlatencies that are separated by 30 to 40 ms. Data qualitatively from Poppel (1986).
Further support for the 30-ms-hypothesis is supplied by neurophysiological observa
tions. The auditory evoked potential in the midlatency region shows an oscillatory com
ponent with a period of 30 ms (Galambos et al., 1981). This component is a sensitive
marker for the anaesthetic state because it selectively disappears during general anaesthe-
48 CHAPTER 3. THEORY
sia (Madler et al., 1987). Thus, oscillations with a period of 30 ms represent functional
system states that are apparently necessary prerequisites for the establishment of events
(Schwender, 1994).
3.4 Neural and Cognitive Models of Time Perception
In a strict sense, time perception should not occur because receptors of what we refer
to as 'time' do not exist. Following the reasoning of the previous section, where the nota
tion of the subjective present was introduced, time can be regarded as a mental construction
based on sensory processing. Conceivabilities about the underlying neural functioning as
well as cognitive models of time perception positioned on a higher level of abstraction
are topics of this section.
A fundamental part of sensory processing is pattern recognition, that is, how central
neurons develop selective responses to spatial and temporal patterns of activity from en
vironmental stimuli. Sensory stimuli can be decomposed into spatial and temporal com
ponents. Spatial patterns refer to those that can be discriminated based on a static 'snap
shot' of which neurons are active (e.g. retinotopy of cortical activation). Temporal pat
terns refer to those in which the order, duration, or interval between the activation of
neurons is required for stimulus discrimination. The duration of flashed bars of light and
the voice-onset time of phonemes are examples of temporal stimuli ranging between few
orders of time magnitude only. All together the brain processes temporal information
over a range of at least ten orders of magnitude - from microseconds to daily circadian
rhythms (see Figure 11).
3.4 Neural and Cognitive Models of Time Perception
TASK ApPROPRIATE MODEL
10.3 Microsecond Processing: Delay Lines0.01 Sound Localisation Labelled Lines
0.1
1Millisecond Processing:
Speech Generation/Recognition Population Models10 Motion Detection
W 100 Motor Coordination.§.
olI( 1 sCD 10 3.$ Second Processing: Pacemaker-Switch-~ 10 4 Conscious Time EstimationIl( 1min Accumulator-Models
10 5
10 6olI( 1 h
Circadiane Rhytm:10 7
Il( 1d Appetite10 8 Sleep-Wake
10 9
Figure 11 Scales of temporal processing. Human process temporal informationover a scale of at least ten orders of magnitude, executing tasks in the microsecond tothe daily scope. At the right side of the figure, appropriate models for particular tasksare listed. There is no sharp border between the use of the appropriate models. Ratherthey are assumed to overlap the particular tasks. Modified from Buonomano et al.(2002).
49
3.4.1 Labelled Lines
The Labelled Lines models are used to explain microsecond temporal processing,
which is primarily responsible for the detection of interaural delays used to localise sound
sources. In humans it takes sound approximately 600 fls to 700 fls to travel the distance
between the left and right ear. The auditory system uses these intervals to calculate the
spatial location of the sound source. A relatively simple but extremely sensitive mecha
nism is used to determine these microsecond intervals: A sound arriving in each ear willactivate neurons in the cochlear nucleus. The axons from these neurons function as delay
lines; that is, the distance an action potential has to travel is proportional to the time it
takes. Neurons in the medial superior olive function as coincidence detectors and use the
delays to respond selectively to different intervals. Together these neurons establish a to
pographic map of auditory space (Carr, 1993). Whereas Labelled Lines models have
50 CHAPTER 3. THEORY
proven suitable to explain microsecond processing, they are not well suited for complex
forms of temporal processing such as sequences and speech (Buonomano et al., 2002).
Computationally, the Labelled Lines models are very effective, but only for simple tasks.
3.4.2 Population Clocks (Neural Networks)
In Population Clocks (or population models), time is coded in the population activity
of a network of neurons, where any given neuron will contain litde temporal information.
An additional difference from Labelled Line models is that there is not an explicit range
of time constants or time delays specifically set to capture specific intervals. These mod
els generally rely on local network dynamics and time-dependent changes in network
states, which appear as a result of e.g. plasticity of synaptic delays. Central to 'biologically
feasible' population models are oscillatory pacemaker neurons. The idea of using oscilla
tors to store an arbitrary temporal sequence was introduced in the sixties by Longuet
Higgins (1968). Since then a series of refinements took place triggered by the use of
computer simulations.
Figure 12 shows a recent approach aiming to model stored time intervals (Miall, 1996).
This model relies on a large population of pacemakers with only a narrow distribution of
oscillation periods. A unique group of pacemakers is selected that have the appropriate
beat frequency to store any particular time interval. Consider a group of oscillators
(pacemaker neurons), each with a slighdy different frequency of oscillation, and each
spiking for a brief part of each cycle. The beat frequency of any pair of these oscillators is
then the frequency at which they spike simultaneously. Thus their beat frequency is much
lower than their intrinsic oscillation frequencyn. It is given by the difference between the
frequencies on the two cells. For a population of oscillators the beat frequency is given
by the lowest common multiple of the periods of their oscillations. A group of a few
hundred pacemaker cells, even with similar oscillation frequencies, can encode a wide
range of time intervals and can recall the interval at a later time.
n which is the requirement for storing time intervals in the second range. With such a model it isnot necessary to assume pacemaker neurons with a great variability in the oscillation frequency,as other models do.
3.4 Neural and Cognitive Models of Time Perception
A Btime
~
1 11111111 111111111
2 111111 I I I I I I I 1.00-...--
3 I I I I I I I I ~0-ca
4 I I I I I I I I 'uIn0
5 I I I I I I I
to t1 t2~ ..
interval tobe stored
Figure 12 Storing time with oscillating neurons. A: A schematic diagram of activity in five oscillators, indicated by short vertical bars. The interval to - t1 can be encodedby selection of those oscillators active both at to and at t1 (oscillators 1, 2, 5). B: Thenetwork: a heterogeneous population of oscillators mutually excite an output neuron,which sums incoming activity and fires when a threshold is reached. Modified fromMiall (1996).
51
Computer simulations of this model show both impressive characteristics and severe
weaknesses regarding the comparableness to biological systems: With such a model it is
neither necessary to assume unrealistically accurate pacemaker neurons, nor to assume
them firing with unrealistic variability (e.g. from tens of milliseconds to tens of seconds).
Furthermore the model is very robust regarding noise: great random fluctuations of the
pacemaker neurons have little impact on the system's behaviour. But as soon as there is a
directional shift instead of random fluctuation of the unit's periods, recall is poor. A fur
ther failure to mimic biology is the relationship between interval duration and accuracy:
The networks, as modelled, are either accurate or they fail. There is no distribution of re
sponses about the desired time that might lead to the typical Weber's Law relationship
between errors and duration. The remaining difficulty with the model presented here is
that the group of selected units encoding a particular time interval or sequence needs to
be synchronously reset to allow recall of the stored interval. This is possible, but would
require some powerful reset signal to reach the entire group of oscillators.
52
3.4.3 Pacemaker-Switch-Accumulator Models
CHAPTER 3. THEORY
Positioned on a - compared to Labelled Lines and Population Clocks - higher level of
abstraction is a class of models referred here to as Pacemaker-Switch-Accumulator models.
As the name suggests, central to these models is a three-step process beginning with a
pacemaker unit that emits pulses (whose rate can be increased or decreased). These
pulses are gated to an accumulator through a switch, which can be closed (so that
pulses pass) or open (pulses cannot pass). The closure of the switch is triggered by in
coming significant temporal information, its opening by the end of the temporal episode
to be estimated (Church, 1984; Gibbon et al., 1984). The accumulator is a perceptual
store similar to an 'up' counter incremented by pulses which have passed the switch.
The Temporal Information Processing Model (TIP) of Figure 13 is an approach, which uses
the Pacemaker-Switch-Accumulator model in its core. It explains the variance of duration
estimations in humans and animals. The model stemmed from animal timing behaviour
experiments where - by means of Classical Conditioning procedures - particular dura
tions were reinforced. From the animal's recall behaviour of reinforced durations the TIP
has been developed. TIP also suites well human duration estimation, where - instead of
reinforced - consciously learned durations have to be recalled, albeit some mechanisms
concerning attention and arousal are not fully clarified. For attention and arousal-related
work see e.g. Treisman et al. (1990), Block (2001), or Zackay (1998).
ISwitch IA "IPacemaker I' I I I '/~IAccumulatorl
Working .iiii.~:v1 ReferencelMemo t " / n* Memory .X ? )I'.L b*
Comparator
Clock Level
Memory Level
Decision Level
YESif
abs (t-n*)/n* < b*
NOif
abs (t-n*)/n* > b*
Figure 13 The Temporal Information Processing Model (TIP) composed of thethree interacting levels: clock, memory, and decision. Modified from (Church, 1984).
3.4 Neural and Cognitive Models of Time Perception 53
Following the Pacemaker-Switch-Accumulator level of the TIP-model, additional two lev
els are introduced and discussed here. The memory level includes a short-term memory
store (working memory), which is functionally equivalent to the accumulator, and a long
term store (reference memory), where reinforced (or learned) durations are transferred at
the end of a trial.
Finally, at the decision level, a comparator compares the number of pulses t currently
in the short-term store, and a random sample n* from the reference memory for the
standard duration, represented as a Gaussian distribution. A decision as to whether or
not to respond depends on the comparison of the absolute difference between t and n*,expressed as a fraction of n*, and a threshold b*, which is a random value drawn from a
Gaussian distribution. Thus, the equation describing the threshold for responding is ex
pressed as [abs (n*- t)/n* < b*], with abs indicating absolute difference. If this normal
ised difference is less than the threshold, responding is initiated (see Church et al. (1994)
for an application of this model).
Pacemaker-Switch-Accumulator models, including TIP, account for (or have been devel
oped due to) several effects in human time perception showing that the subjective dura
tion of a stimulus can be influenced by factors in addition to its actual physical length.
For example, stimuli that are 'filled' (e.g. continuous tones) are usually perceived as
longer than equal-length stimuli that are 'empty' (Thomas et al., 1974). likewise, moving
stimuli have been judged as lasting longer in duration than static ones (Goldstone et al.,
1974; Brown, 1995), presentations of familiar words were judged as lasting longer than
unfamiliar ones (Witherspoon et al., 1985). Frequent results from the classical timing lit
erature are that more intense stimuli tend to be judged as lasting longer than less intense
ones (Fraisse, 1964), as well as 'sounds are judged longer than lights' (Goldstone et al.,
1974). The latter refers to the phenomenon that auditory stimuli frequently appear to
have longer subjective durations than do visual stimuli of the same real-time length.
Explanations for these effects mainly concern either the closure latency of the switch,
or the speed of the pacemaker. The closure latency of the switch is supposed to exceed
its opening latency, and this difference might depend on the modality or the degree of
expectation of the temporal signal (Lejeune, 1998). Or pacemaker speed can be increased
or decreased with arousing or calming stimuli (Boltz, 1994; Wearden et al., 1999).
54 CHAPTER 3. THEORY
3.5 Psychophysical Theory for Measuring Thresholds
As mentioned in the introduction to this thesis, we consider the psychophysical ap
proach suitable in order to measure the perception and acceptance thresholds of particu
lar delay parameters. In what follows, the fundamentals of psychophysical testing, a
specification of the psychophysical function as well as the adaptive psychophysical pro
cedure applied in the empirical part of the thesis are described.
3.5.1 Testing paradigms
Psychophysical procedures dispose of various testing paradigms, of which we describe
the yes-no and the forced-choice (nAFC: n-Alternative-Forced-Choice) mode. With the yes-no
mode subjects are given a series of trials, in which they must judge the presence or ab
sence of a stimulus at each case. The ratio between the number of trials containing a
stimulus and the total number of trials is usually 0.5, but can be any other value. Usually
this ratio is told to the subject in advance. The rate of yes-responses for all tested stimu
lus intensities is defmed as the dependent variable.
Basically a different testing mode is represented by the forced-choice mode: Subjects are
given a variety of n alternatives, from which they have to choose the one containing the
stimulus. The alternatives are presented with either spatial or temporal coincidence, or
without either coincidence. The subjects know that exactly one alternative contains the
stimulus, and that the rest has a zero-stimulus. The differences between these two meth
ods become obvious when the presented stimuli are faint. In the yes-no paradigm the
proportion of yes-answers approaches zero, whereas in the forced-choice paradigm the
proportion of correct answers approaches the value of equal probability for all alterna
tives, which is the reciprocal value of the number of alternatives. Likewise this means
that e.g. in two-alternative forced-choice (2AFC) tasks the threshold is located where ob
servers give 75% of correct responses, since they already gave 50% of correct responses
due to the 2AFC-inherent guessing. The basic advantage of 2AFC consists of its well
founded assumption that subjects will opt for the stimulus evoking the strongest percep
tion, regardless of their tendency to say 'yes' or 'no'. This is in contrast to the yes-no
paradigm, where decision making in the presence of uncertainty is according to the sub
ject's psychological characteristics, like e.g. prudence. Unlike the yes-no mode, the de
pendent variable of nAFC is the rate of correct responses for all tested stimuli instead of
3.5 Psychophysical Theory for Measuring Thresholds 55
the rate of yes-responses. In the following we subsume both kinds of dependent vari
ables under the term positive-response rate 'If.
For most of psychophysical testing, be it in the clinic or in the research lab, efficiency
is of great importance, i.e. the threshold should be estimated with satisfying accuracy af
ter as few as possible trials. The requirement of minimal number of trials is given by the
fact, that after a long run of trials experimental subjects tend to fatigue and to be bored,
resulting in an apparently drift of their thresholds. For this reason, so-called adaptive p.ry
choplrysicalprocedures have been developed, whose prior purpose is to minimize the number
of trials. We will recapitulate the adaptive procedure called best-PEST in chapter 3.5.3, for
more details about adaptive procedures see the overview of Treutwein (1995). In the next
chapter we describe the theoretical background necessary to understand this procedure.
3.5.2 Specification of the Psychometric function If/ =f (t/J )
The psychometric function assigns a positive-response rate 'If to the range of stimulus
intensities. The particular properties of this function are described in the following:
The range of 'lfis bounded as lower limit by the probability to give positive responses
without perceiving the stimulus (false positive or false alarm rate). This false positive rate
consists of a methodical part (only in nAFe), and the 'proper' false positive rate c. The
methodical part is equal to the reciprocal value of the alternatives n. The upper limit of 'If
consists of (1-b): Big stimulus intensities effect positive responses in virtually all the
cases, only reduced by the false negative rate (i.e. misses) 8. The error terms g and care
caused by observers' inattention or fatigue for instance.
56
If/-00 =P (positive response I~ ~ -00 ) =.!. + £n
If/+00 =P (positive response I~ ~ +00 ) =1-g
~ : stimulus intensity { ~E lR }
£ :false positive {£E lR I 0~£~0.5}g :false negative {g E lR I 0 ~ g ~ 0.5}
n : number of alternatives {nE N 12~n~100}o
CHAPTER 3. THEORY
eq (1)
eq (2)
We define the threshold () to be that value of stimulus intensity that yields a specified
positive-response rate. For practical reasons in testing, the threshold is located at the
steepest slope of the psychometric function (derivation see section 3.5.3). In the follow
ing we will exemplify the psychometric function by means of the logistic model, because
this is the kernel function of the adaptive procedure best-PEST, which is the topic of
section 3.5.3:
If/* (~) = (1+eP'(O-rfJ) )-1 eq (3)
: kernel function
: steepness parameter
: threshold
Since the logistic function is rotationally symmetric in the inflection point, the thresh
old is in the middle of the response range [If/-00' If/+00]' Therefore, the rate of positive re
sponses at threshold is:
eq (4)
In order to create a formal link between the two testing paradigms, theyes-no situation
can be considered as forced-choice situation with an infinitive number of alternatives. In this
case the threshold converges to the value where the positive-response rate is:
o the number of alternatives are restricted to 100, since the practicability of experiments withmore that 100 alternatives is doubtful.
3.5 Psychophysical Theory for Measuring Thresholds
If/(}(YesINo) = limO.5(1-£5+..!.-+£) =0.5(1-£5+£)n~oo n
57
eq (5)
The psychometric function If/* (tfJ) has to be adjusted due to the observers' false posi
tive and false negative rates. For these purposes the kernel function is shifted to n-1 + £
and scaled to the response range [If/-oo,lf/+oo], which distance is - according to eq (1)
and eq (2) - equal to 11-£5 -n-I -£\:eq (6)
If/( tfJ) : adjusted psychometric function
In order to deal with a well-known constant, which is comparable between different
magnitudes of stimuli, we let 13 be the slope of the inflection point of the normalized psy
chometric function. We define the threshold to be at stimulus intensity of 0.5, thus we
normalize the stimulus intensity to two threshold units, with the result of obtaining the
'real' slope in an equal-scaled plot (i.e. the slope is equivalent to the tangent of the gradi
ent angle):
dlf/ 13* (1- £5 - n-I - £ ) . * 4 f3f3 = - = that IS f3 =---'-----
dm 4 l-£5-n- I -£'I' rp=(}
f3 : slope of the psychometric function at threshold (inflection point)
eq (7) inserted in eq (6) leads to:
eq (7)
eq (8)
Equation eq (8) is the underlying, generic formula for the threshold estimation by the
best-PEST calculator. Figure 14 depicts the mapping of eq (8) with different parameter
settings:
nE {2, 4, oo}
f3 E {1.5, 3, 7}
£=0.07
£5 =0.04
58 CHAPTER 3. THEORY
positive-response rate '¥ ~1.00-r-----------'8-------------
0.75
4AFC
0.25-+--------.-----~-----r
Yes/No
1/
--------~ ~ =1.5
---~=3.0
-- ~=7.0
1.000.50Normalized stimulus intensity <l> [28]
O.OO+---~-----~~--%.----------,
0.00
Figure 14 Logistic psychometric graphs depicting yes-no and forced-choice situations (nAFC). The asymptotes are at (lln + e), and at (i-b). The slope pis 3 (straightlines), and 7 and 1.5 (dashed lines) respectively. The stimulus intensity is normalized to 2threshold units.
Typically psychometric functions are - as depicted in Figure 14 - of statistical value
(unless they represent a heaviside step function with its 'step' at the threshold value). I.e.
when an observer is presented on several occasions with the same stimulus, s/he or she
is likely to respond yes on some trials and no on other trials. Thus, the threshold cannot
be defmed as the stimulus value below which detection never occurs and above which
detection always occurs, but rather as the stimulus value which is perceptible in a prede-
3.5 Psychophysical Theory for Measuring Thresholds 59
fined percentage (usually 50%) of the trials. Experimenters are confronted with the ques
tion, how to determine the psychometric function of an experimental subject or of a par
ticular study cohort. For that purpose classical psychophysics offers several methods,
which we will not explain here in detail. Readers interested in this topic may consult the
standard work of Gescheider (1997). Recapitulating, we note that with these methods we
determine the detectability of several stimulus intensities, and fit an appropriate sigmoid
shaped curve to the data to obtain the psychometric function. From this function the
50% threshold for instance can be read out.
In order to measure the empirical threshold, the experimenter must decide what
stimulus intensities should be used in the experiment. It should be clear that choosing in
tensities that are all greatly above or below the threshold would provide little information
leading to an accurate estimation of the threshold. In addition to the problem of requir
ing a large quantity of trials to obtain the threshold, waste trials are likely to occur with
these methods, unless the testing range is known in advance. An approach with these
characteristics is far from optimally efficient and consequently the adaptive methods for
measuring threshold have evolved.
3.5.3 Adaptive Psychophysical Procedures
In all adaptive procedures, the intensity of a stimulus presented on a particular trial is
determined by the observer's performance in detecting stimuli presented on prior trials.
Except for one class of procedures called maximum-likelihood methods, all other methods
described in Gescheider (1997) suggest more or less heuristic rules after how many trials
and how much the presented stimulus intensity has to be adjusted. Even though it is a
characteristic of all adaptive procedures to recall information from the past history of an
experimental run, only the maximum-likelihood procedures determine the next stimulus
presentation based on a statistical estimation of the observer's threshold, which is made
from all of the results obtained from the beginning of the run. The statistical technique
of maximum-likelihood estimation assumes that the underlying psychometric function
has a specific form. For example it could be a Gaussian (the cumulative normal distribu
tion), logistic, Weibull, or some other sigmoid-shaped function. Because these functions
have similar forms, the estimated thresholds are not greatly different, and the choice may
only be of importance if e.g. a particular perception model is under test. In the following
60 CHAPTER 3. THEORY
we describe the best-PEST method (pentland, 1980), which uses the logistic function as
underlying modelP.
Maximum-Likelihood: best-PEST
In best-PEST the approach taken to the problem of determining a threshold is to
maximise the information gained with each measurement. In so doing the smallest possi
ble number of measurements will be required. First we derive the choice of the sampling
point on the psychometric function:
For any value fjJ of the stimulus range [O,k], there is a probability tp of a positive an
swer. Given N samples taken at fjJ, of which p were positive, our estimate of tpis:
..... pIf/=
N
If/ : estimate of the probability of a positive response
p : number of positive responses
N : number of samples
the variance is
If/(l-lf/)a=--~
N
a : variance of estimation
and the confidence intervals are
Cl~ =w#If/
Cl~ : width of the confidence interval about ~If/
W : level of desired confidence (e.g. 0.95)
Equations eq (9) and eq (10) inserted in eq (11) leads to
P PEST is the acronym for Parameter Estimation qy Sequential Testing.
eq (9)
eq (10)
eq (11)
3.5 Psychophysical Theory for Measuring Thresholds
Cl~ = W~P(N - p)'If N3
61
eq (12)
To get the stimulus range ljJ corresponding to the confidence interval of the dependent
variable, it has to be divided by the slope of the psychometric curve:
eq (13)
Cl~ : width of the confidence interval about ~
Thus, in order to minimise the estimated confidence interval about the stimulus ljJ for
a given number of trials we have to maximise the slope of the psychometric function.
For all sigmoid-shaped functions, the steepest slope is located at the inflection point. In
the rotationally symmetric logistic function used in best-PEST this point is at the centre
of the curve. In the yes-no mode this is at 50% (if E=O and S=1); in the 2AFC mode this
is at 75% (ifE=0.5 and S=0.5).
In order to explain the best-PEST procedure we reformulate eq (8) and obtain the
probability of getting a positive (if r=1) or negative (if r=-1) response at the i-th trial:
eq (14)
rj : response of the observer at i-th triaL 1'; E {1, -1}
--(Jj : i-th estimate of the threshold
E
Sandeq (2)
: elevation of the psychometric function accord LOg to eq (1)
: scaling of the psychometric function to th( response range according to eq (1)
The strategy in best-PEST is to calculate the likelihood of the sampling point is being
at each point within the testing range and taking as new estimate the stimulus value that
is assigned to the highest probability. After N-1 trials, we find the N-th point of meas
urement by solving:
eq (15)
62 CHAPTER 3. THEORY
where (0, k) is the test range of the stimulus (J, and (0;, ri) denotes the results of the
i-th measurement that was taken at value 0;.
The maximum likelihood estimator is known to be the most efficient unbiased estima
tor. One problem arises: the product of all the probability distributions approaches zero
for large numbers of trials. To overcome this problem, we apply a logarithmic transfor
mation to the likelihood function with the result of obtaining the sum instead of the
product of all likelihood functions. That way, the log-likelihood functions do not run into
underflow and need not to be standardised to the overall probability of 1. Since the loga
rithmic transformation is stricdy monotonic increasing, the locations of relative maxima
are preserved:
N Nmax IIf(x) = max :Llogf(x)
xe(a,b)i=1 xe(a,b)i=1eq (16)
For the case of the used function eq (14), the N-th threshold estimation is calculated
according to eq (15) and eq (16):
( ( ~ )-IJ-- N-l r, 8.-fIJ 4pS-lON = max :L log E +S 1+e ,( I )
tflE(O,k) i=1eq (17)
Figure 15 depicts the expansion of the log-likelihood functions according to eq (17).
The parameter settings in Table 9 are used:
Table 9 Parameter settings of the curves depicted in Figure 15.
~s~.<>
"'">
A=2AFC B=yes/no
N 10 10
E 0.5 0
S 0.5 1
»1 f3 2 2
r {1, 1, -1,1,1, -1,1,1,1, -1} {1, -1,1, -1,1,1, -1, -1,1, -1}
3.5 Psychophysical Theory for Measuring Thresholds 63
koko stimulus intensity <pO-r---------===::-.........
"0oo..c
~
-10
Figure 15 Expansion of the log-likelihood functions in the stimulus interval [0, k]of the adaptive procedure best-PEST. Circles indicate the relative maxima; dashed linesshow the progression of the threshold convergence. Bold lines represent the predefmedinitialisations; thin lines are calculated according to the responses r.A: 2-alternativ forced-choice (2AFC) paradigm. B: yes-no paradigm.
Seite Leer /Blank leaf
4 Experiments
This chapter consists of the description and the results of the conducted threshold determination experiments. The ftrst part concerns experiments in which single subjects interact withthe computer, aiming to determine perception thresholds for relative and absolute delays.Relative delay thresholds are obtainedfor auditory and visual stimuli in both orders. Absolute delay thresholds are obtainedfor the interaction with both vocal or mouse input, and thecorresponding visual response. The secondpart describes experiments in which pairs or tnplesof subjects interact over a videoconference using an emulated communication network. Thispart consists ofexpenments mming to determine absolute delay thresholds for basic auditoryand visual interaction as well asfor realistic communication tasks.
4.1 In Human-Computer Interaction (HCI) Mode
The experiments conducted in the human-computer interaction (HCl) mode comprise
threshold determinations where the experimental subjects receive computer-generated
stimuli triggered by the subjects' inputs. Such experiments can be conducted without us
ing an emulated communication network, and require single subjects only. This makes
these experiments easier to control, since there is no group dynamic aspect present. Fur
thermore, due to the plain technical infrastructure of such experiments, there is much
less effort needed to install and calibrate the whole. The HeI experiments consist of the
following threshold determinations:
• Relative delay thresholds: auditory before visual (condition AV), and visual be
fore auditory (condition VA) (Zuberbiihler et al., 2002).
• Absolute delay thresholds: vocal trigger - visual response (condition VocVis),
and mouse trigger - visual response (MouVis) (Zuberbiihler et al., 2003).
66
4.1.1 Experimental Setup
CHAPTER 4. EXPERIMENTS
The experimental set up, including stimulus presentation, best-PEST algorithm, and
data acquisition was implemented in a fully computerised environment using Macrome
dia Director's object oriented scripting language Lingo. The temporal resolution capacity
of the entire system was in the range of ±5 ms, whereas the minimal increment adminis
tered by the adaptive procedure has been set to 10 ms. The settings of the particular pa
rameters are listed in Table 10.
Table 10 Parameter setting for threshold determinations in HeI (for an explana-tion of the parameters see the Annex on page 101).
HD.,!y- .. ,
Relative Delay (AV and VA) Absolute Delay (VocVis and MouVis)
Mode 2AFC 2AFCl<
Start value k 400ms 610 ms
Smallest step size 10 ms 10ms
Termination criterion 12 trials 12 trials++
Slope of best-PEST 1.75 1.75
J False negative 8 0 0
False positive e 0 0
Mean of x trials 3 3fH
Runs per subject 3 2+1training
4.1.2 Procedure
We used 2AFC tasks, and applied the adapti' e procedure called best-PEST, suggested
by Pentland (1980), in all experiments investigating HCI. Best-PEST is described in sec
tion 3.5.3 (page 59). 2AFC tasks are designed to dissuade biased influences from the ob
servers' decision criterion, and best-PEST is assumed to deliver thresholds after smallest
possible number of trials. In all experiments the subjects received their instructions via
written text, displayed at the appropriate time during the experimental run. Their task
was to detect whether a delay appeared on the right or on the left side of the screen. The
appearance of the delay position was randomly balanced. In addition, every six trials we
4.1 In Human-Computer Interaction (HCn Mode 67
presented one intermittent trial with a large delay. This trial neither contributed to the re
sults nor did it influence the best-PEST-estimation. This particular procedure was chosen
based on insights gained via pre-tests that subjects tended to be bored after reaching their
approximate threshold (at this point the stimulus is very faint). The presentation of an in
termittent trial with large delays gives the subject the experience of success, resulting in
an increased motivation.
In the following we describe those parts of the procedure that were not common with
all threshold determination experiments.
Relative Delay: Auditory before Visual (AV) and Visual before Auditory (VA)
The delay occurred between the presentation of a black disc (diameter of 4 arc de
grees) on yellow background and the presentation of a 1 kHz tone of 60 dBA with
rise/ fall time of 10 ms. Both disc and tone lasted for 500 ms, therefore stimulus onsets as
well as stimulus offsets served as clues for delay. Stimuli order (auditory before visual or
vice versa) was randomly chosen. That way subjects could not gain insight to the logic of
the best-PEST procedure, and should not have been able to predict the next trials.
",
visual stimulusc:::
visual stimulus.2 -.:!: .c::lI) ,0,
& c:::-.eauditory stimulus 0 ot:: auditory stimulus c:::
~ .!!! ,~-c::: e ~~ ~t ~ ~ ~t ~.2! e =u ..- a-'"- -o 250 500 750 1000 1250 1500 1750
t [ms]
Figure 16 Test sequence of one trial. The duration of ~t is equal to the maximumlikelihood of the threshold computed in the best-PEST procedure. The occurrence of~t is randomly balanced between the left and the right side of the screen. The questionafter each sequence was: »On which side did you perceive a time difference betweensound and picture?«
Seven female and nine male subjects (aged between 25 and 56, mean=32) participated
in the 20 minute experiment. The experimental design was within-group, i.e. all subject
68 CHAPTER 4. EXPERIMENTS
performed all conditions. The 1kHz hearing threshold level and the visual acuity of the
subjects were tested by means of the audiometer Bosch ST10, and the Landoh-rings acu
ity chart. All subjects had normal hearing and normal or corrected to normal vision. Each
subject had to complete the threshold procedure three times for both orders, resulting in
a total of 96 threshold estimations. The median of three threshold estimations per person
was taken for further analysis.
Absolute Delay: Vocal Trigger - Visual Response (VocVis)
The delay occurred between a vocal trigger and the disappearance of a visual stimulus
on the screen. As visual stimulus, a black cipher with a height of 4 arc degrees appeared
on a white background. The subjects were told to pronounce the displayed cipher. As
soon as the sound level of their voice exceeded 65.5 dBA the cipher disappeared either
after a small, system-inherent delay or after the delay calculated by best-PEST. A sound
level of 65.5 dBA was chosen as the trigger point because (1) this sound level represented
the average voice level of subjects and (2) it is high enough to avoid the disappearance of
the cipher due to background noise.
.(
_) stimulus
T 0
voiceinput
c::.0..Q
~ to~Cl)Cl);:,0- t [ms) "
, -
a
tDL+500
\.(I
stimulus visy
T 0ce
c::voice .2 -:t::: .cinput ~ .th
t::Q. .s-0 It::Cl) J!!th
~ L\t+to ~ i e'5 e
'-=I I,I
o
ViSU~
D=disappearan
T=trigger on65.5 dBA
Figure 17 Test sequence of one trial of the voice-visual interaction threshold experiment. The duration of Lit is equal to the maximum likelihood of the threshold computed in the best-PEST procedure. The occurrence of Lit is randomly balanced betweenthe left and the right side of the screen. to is the average response latency of the microphone device. The question after each sequence was: »On which side did you perceive adelayed disappearance of the cipher?«
4.1 In Human-Computer Interaction (HCI) Mode 69
Absolute Delay: Mouse Trigger - Visual Response (MouVis)
The delay occurred between a mouse trigger and the appearance of a visual stimulus
on the screen. The visual stimulus consisted of a red square with a side length of 5 arc
degrees on a white background. The subjects were told to click into a white square,
whereupon the square changed its colour to red, either immediately or after a delay calcu
lated by best-PEST.
,,,
visual stimulus,
T=D
c:: ......2 .-- 5~ .c:
8- .th ct:.e .::tIt.- .20 tt:: U c::
& J!! L..- .0.;::~ e Cl)
Cl)
'5 e :::st[msL'to: 0-., ,.
tOL+500 tOR tOR+500
~t
ViSU~ stimulus
T 0
r-,------1I~--...._IIIIIi-----+_-___1
o
D=disappearance
T=trigger onmouse up
Figure 18 Test sequence of one trial of the click-visual interaction threshold experiment. The duration of ~t is equal to the maximum likelihood of the threshold computed in the best-PEST procedure. The occurrence of ~t is randomly balanced betweenthe left and the right side of the screen. The question after each sequence was: »Onwhich side did you perceive a delayed change of colour?«
Seven female and 17 male subjects (aged between 19 and 41) were recruited for the
two experiments testing two different input modalities (VocVis and MouVis). The ex
perimental design was within-group, i.e. all subject performed all conditions. Each of the
experiments lasted approximately 20 minutes. The subject had to complete the threshold
procedure three times for both modalities, resulting in a total of 144 threshold estima
tions. The first threshold estimation in each condition was considered as practice and was
therefore excluded from further analysis.
70
4.1.3 HCI-Results
CHAPTER 4. EXPERIMENTS
Relative Delay: A=Auditory before Visual (AV) and B=Visual before Auditory (VA)
Figure 19 shows the logistic psychometric functions for the perception of relative de
lays (curve fitting by means of the method of the least-squares). The 75 % thresholds ob
tained are 74 ms (AV), and 98 ms 01A), respectively. The thresholds - obtained by the
adaptive procedure best-PEST - are 71 (±17) ms for the AV-condition, and 105 (±25)
ms for the VA-condition, (numbers in brackets stand for the 95% confidence levels) (see
Figure 20). A one-sided, paired t-test shows that the mean of the VA-threshold is signifi
candy higher (p<0.05) than the mean of the AV-threshold. Gender and age had no sig
nificant effect on the detection of relative delays.
0.25
o Auditory before Visual (AV)
o ••••••. Visual before Auditory (VA)
o 50 100 150 200 250Delay (ms)
300 350 400
Figure 19 Psychometric functions for perception of relative AV delay (straightline) and relative VA delay (dashed line). Arrows indicate the 75% thresholds, which areat 74 ms (AV), and 98 ms ryA), respectively. The data are fitted with a logistic model.
4.1 In Human-Computer Interaction (HCn Mode 71
200A n=16 • audio before visual
175 + plus 95% Col.
+ minus 95% C.1.150 i - - - threshold AV
Ui' 125E';: 100III + + +'ii + + + + + +"0 75
50
25
0 ------------'----_._-_._--~, __ ._____....l ..,._~~_._
___L---___
1 2 3 4 5 6 7 8 9 10 11# of trials
200 -
B n=16 • visual before audio175 ; + plus 95% Col.
150 t minus 95% Col.- - - threshold VA iL-__,.. ____._....______~____~
Ui'125 + + + + +E
+ + +
';:100III'ii 75 ~"0
50 ~
25
01 2 3 4 5 6 7 8 9 10 11
# of trials
Figure 20 Delays calculated by best-PEST for every trial. The curves representthe mean of all subjects. Continuous lines show the progression of the threshold convergence. Dashed lines indicate the final thresholds; grey lines indicate the 95% confidence interval. A: Auditory before visual, B: Visual before auditory.
72
Absolute Delay: VocVis and MouVis
CHAPTER 4. EXPERIMENTS
Figure 21 shows the logistic psychometric functions for the perception of absolute de
lays (curve fitting by means of the method of the least-squares). The 75 % thresholds ob
tained are 98 ms when vocal inputs trigger visual responses (VocVis), and 65 ms when a
click inputs trigger visual responses (MouVis). The thresholds - obtained by the adaptive
procedure best-PEST - are 115 (±23) ms for the VocVis-condition, and 78 (±14) ms for
the MouVis-condition, respectively (numbers in brackets stand for the 95% confidence
levels). A one-sided, paired t-test shows that the mean of the VocVis-threshold is signifi
cantly higher (p=O.Ol) than the mean of the MouVis-threshold. Gender and age had no
significant effect on the detection of absolute delays in Her.
--- Mouse Trigger I Visual Response
0······· Voice Trigger I Visual Response
0.25
o 100 200 300Delay (ms)
400 500 600
Figure 21 Psychometric functions for delay perception between mouse triggerand visual response (straight line) as well as between voice trigger and visual response(dashed line). Arrows indicate the 75% thresholds.
4.1 In Human-Computer Interaction (HCn Mode 73
_ ...•.-.-~--._._-._--_..,~
n=24 " voice trigger-visual response+ plus 95% C.I.
minus 95% C.1.
- - - threshold voice-visual
350 ~
A300
250
UiE 200....>-.!!! 150III"C
+ + + + + + + +
50 -
o1 2 3 4 5 6 7
# of trials8 9 10 11 12
+ ++ +... - "+=,
n=24 " mouse trigger-visual response I
+ plus 95% C.I. '
minus 95% C.I.
- - - threshold mouse-visual
5 6 7# of trials
8 9 10 11 12
Figure 22 Delays calculated by best-PEST for every trial. The curves representthe mean of all subjects. Continuous lines show the progression of the threshold convergence. Dashed lines indicate the final thresholds; grey lines indicate the 95% confidence interval.A: Voice trigger - Visual response, B: Mouse trigger - Visual response.
Table 11 summarises the results of the threshold determinations conducted in the HeI
mode.
74 CHAPTER 4. EXPERIMENTS
Table 11 Summary of results of the HeI experiments.
.... ..i,lay.....
R!lifi'l'YDelar
AV VA VocVis MouVis
Threshold best-PEST 71 (±17) ms 105 (±25) ms 115 (± 23) ms 78 (± 14) ms
Threshold fitted model 77 ms 98ms 98ms 65 ms
Slope Ps of the standardised psy- 1.756 2.252 1.111 1.133chometric function (threshold at 0.5)
Experimental design Within-group Within-group
Significance difference AV <VA (p<0.05) VocVis > MouVis (p=0.01)
Significance age (p<0.05) no no no no
Significance gender (p<0.05) no no no no
4.2 In Human-Human Interaction (HHI) Mode
The experiments conducted in the human-human interaction (RH!) mode comprise
threshold determinations in which the experimental subjects interact with each other
over a videoconference that uses an emulated ATM-network infrastructure. Thus, in a
strict sense these experiments investigate network-mediated HHI. The participating subjects
act as both stimulus producer and stimulus receiver. In contrast to the experiments de
scribed before, the intermediary computer-system is not involved in producing stimuli, in
the sense of newly created ones. Rather it is used to process and transmit the human ex
pressions, and to reproduce them as realistically as possible. The HHI experiments con
sist of the following threshold determinations:
• Absolute delcry: Basic auditory interaction between two subjects (condition
AudBas), and basic visual interaction between two subjects (condition Vis
Bas).
• Absolute delcry: Realistic audio-visual interaction ber leen three subjects (con
dition AudVisReal), and realistic auditory interaction between three subjects
(condition AudReal).
4.2 In Human-Human Interaction (HHI) Mode 75
4.2.1 Experimental Setup
The experimental set-up depicted in Figure 23 consists of two (in the case of basic in
teractions) or three (in the case of audio-only and audio-visual interactions) videoconfer
ence stations accommodating the so called ETHMICS Kubus (which contains the ETH
MICS videoconferencing system developed at the Computer Engineering and Networks Labo
ratory (Rothlisberger, 1998), the ATM Transmission Hardware, as well as a built-in Mac
intosh computer), a monitor, a camera, a microphone and headphones. The workstations
are connected via fibre passing through a system called ARES (Kurmann, 1997), which
simulates the behaviour of ATM channels in real-time, with performance degradations
(such as delay or errors) for various network configurations and assumptions about
background traffic. The whole is supervised by a control station, which sends delay set
tings to ARES. The control station is also connected to the workstations, in order to ask
the test participants periodically to give their ratings concerning the delay (by means of a
UDP based client/server application). The values are sent back to the control station
where the next delay, according to best-PEST, is calculated.
__ VideoconferenceD-- Network (ATM)
Control NetworkD(Ethernet)Recording (IEEE 845i)
" "".··.'.'m'w.' ·.=·,~,..,,,w,,...... ..·'··m"o'o.".·w···· ,
, '""""""""":;;=::::1"1 111Kubus 3 • I! 11
! fiiiiiiic:Jiiiiiiitll !I1I! 11! 111
."."" ......_ ..._.=.J I1I
III
~Recordingostation
ARES
IKubus2 8 I
-,=--r-J:JJ
111 serial bus
O!Control station
Figure 23 Wiring of the experimental set-up.
Furthermore the video signals from the videoconference cameras are displayed on a
monitor in the observation area. Additionally these signals are recorded on digital video
tape and are saved directly onto a hard drive. For further analysis, the data is encoded
76 CHAPTER 4. EXPERIMENTS
into MPEG 2 and burned to a DVD. The whole experimental set-up has a built-in one
way delay of about 65 ms (with buffered audio stream). This means that the no delay
situation presented to the subjects has in fact an absolute (sub threshold) delay of 130
ms. Table 12 lists the separate delays inherent in the particular videoconference compo
nents. It can be seen that major delay contributions are due to the capturing and process
ing of visual information.
Table 12 Minimal one-way delay in the videoconference network subdivided intothe particular components. Data from (Rothlisberger, 1998).
Average Delay Ems]
CCO-Camera 30
Oigitiser 0
JPEG-Encoder 1
Channel Buffer outgoing 0.5:t:::c:: ATM Network <1:::;) f-------------1I----------J
4.2.2
Channel Buffer incoming
JPEG-Oecoder
Scaling and De-Interlacing
Graphics-Card
TOTAL
Procedure
0.5
1
24
7
65
All experiments investigating HHI are conducted with the above-specified set-up em
ploying the best-PEST procedure. It is not possible to approach the thresholds of all in
teracting subjects simultaneously, since the subjects share the same delay values calcu
lated on the response basis of only one subject. Therefore, in all HHI experiments, using
adaptive methods, one has to assign a subject whose threshold is determined thereafter.
The remaining subjects contribute only with their corresponding ratings. In the following
the particular procedures for the HHI experiments are described.
4.2 In Human-Human Interaction (HHI) Mode 77
Absolute Delay: Basic Interaction Task (AudBas and VisBas)
The aim of this task was to evaluate the absolute deltry threshold for basic auditory and
visual interactions. For these purposes the experimental subjects had either to count
from one to ten in alternate order (auditory condition), or had to give hand signs in the
same way (visual condition). One of the two subjects held the relevant information re
quired for the best-PEST calculation. If the answer of the this subject was correct the de
lay value for the next trial was decreased, otherwise it was increased. The subjects were
instructed to react as fast as possible after recognising the partner's expression. That way,
the unknown reaction time could be better controlled, in the sense that no reasoning
took place about the answer to give. Applying a 2AFC paradigm, this procedure had to
be accomplished twice (see Figure 24): one course with an introduced delay computed by
best-PEST, and another course without any additional delay. The delay was randomly in
troduced either in the first or the second course, and the subject's task was to indicate in
which of the two courses the delay was. With this task we expect to measure the lowest
possible threshold for absolute delays, since the subjects communicate with maximal de
gree of interactivity.
A sA.1
B• • • • •
~..~--_ .I-----------~t [ms]
(2)A sA.1
BFigure 24 Test sequence of one trial consisting of two courses with five stimulusexpressions per subject. The stimuli s is of auditory or visual type. 1:0 is the built-in delayplus reaction time. ~t is the transit delay equal to the maximum likelihood of thethreshold computed with the best-PEST algorithm. The occurrence of ~t is randomlybalanced between the first and the second course. The question after each sequencewas: »In which course did you perceive a delay?«
Six female and 14 male subjects (aged between 21 and 35, mean=25) completed the
basic interaction task, giving a total of 640 ratings for different delay values. The experi
mental design was within-group, i.e. all subject performed all conditions. The subjects
were mainly recruited on campus and received 10 CHF for participation in the 20 minute
experiment.
78
Absolute Delay: Realistic Task (AudVisReal and AudReal)
CHAPTER 4. EXPERIMENTS
The aim of this task was to evaluate the absolute delay thresholds for a realistic
communication scenario. Evoking natural conversations between subjects that are
captured by video cameras and observed by experimenters is still a great challenge.
Therefore, the least we can do is select a discussion topic, which is familiar to the target
group acting as experimental subjects. As for the basic task, the subjects for this task
were mainly university students. We consider them to be familiar with the problems of
shared flats, either from experience or from hearsay, and consequently they should have
a well-founded opinion to communicate. This makes this topic suitable to be discussed in
the experiment.
The task was structured in two parts: At ftrst the subjects introduced themselves over
the videoconference (condition AudVisReal) or over the audio channel (condition
AudReal). They could do that autonomously or according to predeftned questions to be
asked to each other. During this phase the supervisor introduced pronounced or no delay
values and gave the relevant information to the subjects, in order to acquaint them with
the delay issue. In the second phase the subjects were required to communicate freely ac
cording to the following scenario: One of the three subjects has rented a four-room flat
and needs to ftnd two flat-mates. The remaining two subjects perform the prospects. As
discussion hints they were delivered with catchwords such as shopping and food, visits of
friends, or cleaning regime. Furthermore they had floor plans of the flat that were to be
used for the room allocation. During the second phase delay was introduced according to
the best-PEST algorithm, using a yes-no paradigm, i.e. after one minute of conversation,
the subjects were asked whether they perceived a delay or not. For these experiments we
ran two interleaved best-PEST calculations: One aiming to approach the perception
threshold, and another aiming to approach the acceptance threshold. After having per
ceived a delay the subjects were asked whether it was disturbing or not.
In the audio-visual condition, 30 female and 47 male subjects (aged between 20 and 45
mean=24) completed the realistic task, giving a total of 954 perception and 602 accep
tance ratings for different delay values. The subjects received 30 CHF for participation in
the 45 minute experiment. In the audio-only condition, 8 female and 22 male subjects
(aged between 19 and 32 mean=24) completed the realistic task, giving a total of 438
perception and 326 acceptance ratings for different delay values. The subjects received 10
CHF for participation in the 30 minute experiment.
4.2 In Human-Human Interaction (HHI) Mode 79
4.2.3 HHI-Results
Absolute Delay: Basic Interaction Tasks (AudBas and VisBas)
Figure 25 shows the logistic psychometric functions for absolute delays. The 75 %
thresholds are at 196 ms (AudBas) and 204 ms (VisBas). The thresholds - obtained by
the adaptive procedure best-PEST (see Figure 26) - are 216 (±44) for AudBas, and 237
(±92) ms for VisBas, (numbers in brackets stand for the 95% confidence levels). A one
sided, paired t-test shows that the two means are not significandy different (p>0.05).
Gender and age had no significant effect on the detection of basic interaction delays.
oo 0
Visual Perceptiono • • • • •• of Absolute Delay
Auditory Perception0---- of Absolute Delay
o
o
1.00
~ 0.75 ~---------C)----171a..;
;~
"'C(
<:;~ 0.50 ...-----~
<3 0oJ2~
0.25
o 80 160 240 320Absolute Delay (ms)
400 480
Figure 25 Psychometric functions for absolute delay perception with auditory interaction (straight line) and visual interaction (dashed lir ~). Arrows indicate the 75%thresholds. The experimental data are fitted with a logisti( model. The data points represent rates of correct answers for particular delay values that have been obtained withequal or more than 30 measurements.
80 CHAPTER 4. EXPERIMENTS
500 r I'----~·,···_--_·__·,,· ..-··_--_··_--_·_·__·_-....---------,,
n=9 i --auditory interaction '450 + plus 95% Cl.
I400A
minus 95% Cl. I
350 - - - threshold auditory+.... ~... _....~+-_ ...~.. -..-~-- ..III 300 + +E';:250III --------"i 200 ~
"C
150 ~
100
50 r
01 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# of trials
500 r---------------- ..-.-----...----....---.-----.~,
+ n=9 I --visual interaction I
450 I + plus 95% Cl.I
400 + + !
minus 95% Cl.
350 B + threshold visual+ ---....III 300 - + + +E';: 250
+
III -------"i 200"C
150
100 r
50
0 ------'--------- .._-----'----- ___L ____ --------"-----.-,----------'---- ..- -------l __ ,_____~_.___.L._______ ,_.J
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16# of trials
Figure 26 The curves represent the mean of all subjects. Continuous lines showthe progression of the threshold convergence. Dashed lines indicate the final thresholds;grey lines indicate the 95% confidence interval. A:. Auditory interaction, B: Visual interaction.
4.2 In Human-Human Interaction (HHI) Mode
Absolute Delay: Realistic Task (AudVisReal)
81
Figure 27 shows logistic psychometric functions for perception and acceptance when
interacting audio-visually with a realistic task (condition AudVisReal) (curve fitting by
means of the method of the least-squares). The 50 % perception threshold obtained is
1220 ms, and the 50 % acceptance threshold is 2080 ms.
1.00
--- Perception of Delay
••••••. Non-Acceptance of Delay
0.75
0.25
o 400
o
800 1200 1600Absolute Delay (ms)
2000 2400 2800
Figure 27 Psychometric functions for absolute delay perception (straight line) andacceptance (dashed line) in a realistic, conversational task using both the audio and thevisual channel. Arrows indicate the 50% threshold~. The experimental data are fittedwith a logistic model. The data points represent rates of yes-answers for particular delayvalues that have been obtained with equal or more than 100 measurements.
82 CHAPTER 4. EXPERIMENTS
Absolute Delay: Realistic Task (AudReal)
The thresholds of absolute delays - obtained by the adaptive procedure best-PEST
in a task where only the audio channel is supported are at 970 (±330) ms for perception,
and 1760 (±410) ms acceptance, (numbers in brackets stand for the 95% confidence lev
els). A one-sided, paired t-test shows that the mean perception threshold is significandy
higher (p<0.01) than the mean acceptance threshold. Gender and age had no significant
effect on the perception and acceptance of absolute delays. Figure 28 shows the particu
lar logistic psychometric functions, obtained by a curve fitting procedure by means of the
method of the least-squares. The 50 % thresholds obtained are 800 ms (perception), and
1690 ms (acceptance), respectively.
--- Perception of Delay
•••••• , Non-Acceptance of Delay
28002400
o
20001200 1600Absolute Delay (ms)
800400o
1.00
0.75
~Cl)c:,:tI)
~0.50....0 0.e&!
0.25
Figure 28 Psychometric functions for absolute delay perception (straight line) andacceptance (dashed line) in a realistic, conversational task using only the audio channel.Arrows indicate the 50% thresholds. The experimental data are fitted with a logisticmodel. The data points represent rates of yes-answers for particular delay values thathave been obtained with equal or more than 30 measurements.
4.2 In Human-Human Interaction (HHI) Mode 83
Table 13 summarises the results of the threshold determinations conducted in the
HHI mode. Note that in the AudVisReal condition, no best-PEST procedure was ap
plied, thus no individual thresholds were obtained. As a consequence it is not possible to
quote confidence levels and statements about significance.
Table 13 Summary of results of the HHI experiments (n.a. means: not available).
B8$.lc Interaction
AudBas VisBas AudVisReal AudReal
Perception Threshold best-PEST 216 (±44) ms 237 (±92) ms n.a. 970 (±330) ms
Perception Threshold fitted model 196 ms 204 ms 1220 ms 800 ms
Acceptance Threshold best-PEST n.a. n.a. n.a. 1760 (±410) ms
Acceptance Threshold fitted model n.a. n.a. 2080 ms 1690 ms
Slope Ps of the standardised psy- 3.316 3.1620.8889 (perc.) 1.056 (perc.)
cho-metric function (thresh. at 0.5) 1.575 (accept.) 2.324 (accept.)
Experimental design Within-group Between-group
Significance difference (p<0.05) no n.a.
Significance age (p<0.05) no no n.a. no
Significance gender (p<0.05) no no n.a. no
Seite Leer /
Blank leaf
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
II
5
5.1
Discussion and Conclusions
In this chapter we discuss the results of the experiments described in the previous chapter.The discussion is divided into relative deltrys, absolute deltrys, and a section where we discussthe task dependenry ofthe perception and acceptance ofde/try.
Regarding Relative Delays
5.1.1 In Human-Computer Interaction (HCI)
The summary of the HeI experiments from Table 11 (page 74) shows that the percep
tion threshold of a visual stimulus preceding an auditory stimulus is approximately 30 ms
higher than the perception threshold of reverse ordered stimuli. This is plausible since it
reflects human experience in a natural environment, where the propagation speed of light
is much higher than that of sound. Thus, humans are adapted to this situation and
thereby less sensitive to it. Other studies (Dixon et al., 1980; McGrath et al., 1985;
Lewkowicz, 1996) found that synchronisation errors are detected easier the more artifi
cial the presented situation is. For our experiment, the chosen presentation is highlyarti
ficial. Thus, we consider the thresholds we found to be suitable for most stringent condi
tions, as might be present e.g. in telesurgery applications. A~ suggested, just noticeable
relative delays may serve as a decision support for content and service providers and
network planners, in such a way that below these values users will not benefit from op
timisation of the network referring to relative delays.
86 CHAPTER 5. DISCUSSION AND CONCLUSIONS
Since detection performance of relative delays is distributed over user populations, it is
- from a service provider's point of view - a 'political' question, which user percentage
will be accepted to perceive a particular relative delay. From the psychometric functions
of Figure 19 (page 70) such detailed information can be calculated (see Table 15). For
this purpose, the inverse function of the generic psychometric function of eq (8) (page
57) is determined:
[
In (I00lf/-1 - I)Jt/J =() I - -----'-__---C-
2Ps
t/J : delay [ms] {t/J E R !t/J ~ O}
If/ : user percentage [%] {If/E RiO < If/ < lOO}
Ps :standardised slope (i.e. threshold is at stimulus intensity of 0.5)
() : threshold [ms]
eq (18)
Table 14 Parameter values obtained from the resulting psychometric function ofthe relative delay experiments. These values inserted in eq (18) lead to the values listedin Table 15.
8 77 98
Ps 1.756 2.252
5.1 Regarding Relative Delays
Table 15 Relative delay values perceived by particular percentages of users.Reading example: It can be expected that not more than 25 % of users will detect anAV-delay of 53 ms, and a VA-delay of74 ms, respectively.
87
Percentage of UsersDetecting Asynchrony in Hel
'1/[%]
5
10
25
33
50
67
75
90
95
Extent of Asynchrony whenAuditory Precedes Visual (AV)
{lema]
12
29
53
61
77
92
101
125
141
Extent of Asynchrony whenVisual Precedes Auditory (VA)
{lema]
34
50
74
67
98
113
122
146
162
Note that - due to consistency reasons with the adaptive method best-PEST - we
used a logistic model to fit the data, thus assuming the user's detection performance to
follow a logistic function. In some respects this proceeding might look unusual, since
measured values are commonly mapped by Gaussian distributions. But, since the nota
tion of the logistic function is more practicable and the two distributions only differ neg
ligibly, we consider the chosen procedure more maintainable in praxis. This proceeding is
also supported by the fact that threshold values obtained with the model fitting proce
dure are close to the threshold values obtained with the adaptive procedure, i.e. they are
within the 95 % confidence interval. From these results it can be concluded, that the lo
gistic model is appropriate - at least in the middle of the response range, i.e. around the
threshold value.
Ifwe take a closer view at the results taking into consideration the different processing
times of auditory and visual stimuli, it appears that the threshold differences between AV
and VA are inverted: Considering the point of time when auditory and visual stimuli are
available to consciousness, it becomes obvious that a bigger internal time difference 'tAV
than 'tVA is needed to perceive an audio-visual event as asynchronous (see Figure 29). At
the AV-threshold value of 71 ms, the perceived difference - in consideration of the dif
ferent processing times for auditory (15 ms) and visual (55 ms) stimuli - takes the value
88 CHAPTER 5. DISCUSSION AND CONCLUSIONS
'tAV of approximately 110 ms, whereas the perceived difference 'tVA at the VA-threshold
value of 105 ms is approximately 65 ms.
:2 .......--------------.o auditory.cf visual.c-•~
auditory
auditory
500 t [ms]400300
visual
visual
200
Figure 29 Perceived time differences at the thresholds obtained from the relativedelay experiments, when considering the processing time differences of auditory andvisual stimuli.
Recent ERP-findings (Molholm et al., 2002) suggest that the auditory component of
an audio-visual event prepares early visual areas in the cortex for the awaiting visual
component. Under the assumption that this preparation leads to an earlier conscious per
ception of the visual stimulus, this effect could account for an equalisation of differences
between 'tAV and 'tVA, and could thereby indicate an univer"al time quantum within which
multimodal synchrony is perceived. Following such argumentation, it would be necessary
to know, if the complementary effect is also observed, i.e. if the visual component of an
audio-visual event is found to prepare the auditory cortex. To our knowledge, such an ef
fect is not known, and remains to be investigated. Although our data do not reveal suffi
cient evidence to elucidate such effects, they can serve as basis leading to formulate well
founded hypotheses.
5.2 Regarding Absolute Delays 89
5.2
5.2.1
Regarding Absolute Delays
In Human-Computer Interaction (HCI)
The results of Table 11 (page 74) show that it is significantly easier to perceive an ab
solute delay when interacting with the computer by mouse-clicks rather than vocally. Or
more precisely: Voice-visual interaction delays are less likely to be detected than click
visual interaction delays. The difference of the two thresholds is approximately 30 ms.
Since the voice trigger is less distinct compared to the mouse trigger, this difference
makes sense, i.e. the sharp onsets of mouse clicks facilitates detection of delays, com
pared to the blurred onsets of vocal utterances.
Table 16 Parameter values obtained from the resulting psychometric function ofthe absolute delay experiments in HeI. These values inserted in eq (18) (page 86) lead tothe values listed in Table 17.
VocVis. . .... ...8 98 64.8
Ps 1.111 1.133
Table 17 Absolute delay values perceived by particular percentages of users.Reading example: It can be expected that up to 75 % of users will detect an absolute delay of 146 ms when interacting by voice. And up to 75 % of users will detect an absolutedelay of 96 ms when interacting by mouse clicks.
Percentages of Users DetectingAbsolute Delays in Hel
",[OhJ
25
33
50
67
75
90
95
Absolute Delay inVocVis Interaction Mode
; {msJ
50
67
98
129
146
195
228
Absolute Delay inMouVis Interaction Mode
; {msJ
33
45
65
85
96
128
149
90 CHAPTER 5. DISCUSSION AND CONCLUSIONS
Table 11 (page 74) shows that the threshold obtained with the model fitting procedure
is within the 95 % confidence interval of the mean threshold obtained with best-PEST.
Thus - like in the relative delay experiments - we suggest that, around threshold, the lo
gistic model fits well to the experimental data. However, considering the psychometric
function of Figure 21 (page 72), it appears that this is no longer true for small delays: The
best fit of the logistic function intersects the ordinate at a point above 50 %. Translated
to user percentages this would mean that there is a certain percentage of users, say 10 %,which would detect a nonexistent delay. Actually, this scenario is conceivable in experi
ments using the yes-no mode, when subjects give yes-answers without perceiving a delay
(commonly expressed by the false alarm rate l!). But in forced-choice experiments - thus in
the case at hand - this is assumed not to happen, rather they are applied just because one
aims to avoid such response bias. The following two reason illustrate, why these response
biases are unlikely to happen:
• The subjects had to assign, in which presentation the delay occurred, not ifthere was a delay. That way, the subjects could not pretend to perceive a de
lay.
• The presentation containing the delay was randomly distributed, and varied
following the best-PEST procedure; thus it could not be anticipated.
Thus, since the bias is unlikely to result from methodical weaknesses, we must con
sider and possibly revise the assumption made that the logistic (and also the Gaussian)
distribution maps the user's detection performance. At least for absolute delays, we have
to take into account other distributions. Qualitatively, it seems that right-skewed distribu
tions (e.g. Log-Normal and Poissonq) could better match the data. In fact, Limpert et al.
(2001) suggest that the Log-Normal distribution maps multiplicative biological processes
better than the popular Gaussian distribution. Furthermore, they analysed arbitrary
Gaussian measuring data, and found, that the Log-Normal distribution matched the data
at least as well as the Gaussian distribution.
For further delay experiments, we suggest to fit the obtained data with a cumulative
Log-Normal or cumulative Poisson distribution, and to use one of these distributions as
the underlying function of the best-PEST procedure. The drawback of this proceeding is
q Interestingly, in Pacemaker-Switch-Accumulator models (see section 3.4.3 on page 52) the scientific discourse concerns - among others - the question, which distribution the pacemaker frequency is likely to follow. The hypothesis that the frequency follows a Poisson distribution ismore and more evident (Gibbon, 1992; Wearden et aI., 2001).
5.2 Regarding Absolute Delays 91
that both suggested models are computationally impractical. As a consequence, numerical
approximations of the two functions must be implemented.
For the time being, we have to be satisfied with the data at hand. However, in order to
decide if the logistic model is appropriate enough for small delays, we suggest a heuristic
rule for its use: If the standardised slope (i.e. where the threshold is set to the stimulus in
tensity of 0.5) of the particular psychometric function is greater than 1.7, the model willaccount for small delays. Complying with this rule, one can expect the logistic function
intersecting the ordinate at values smaller than 3.2 %. This means that less than 3.2 % of
the users would detect a nonexistent delay. As can be seen in Table 16, the standardised
slopes of the MouVis, and the VoiVis conditions are smaller than 1.7. That is why we re
frain from declaring users percentages detecting small delays for these conditions (see
Table 17 on page 89).
5.2.2 In Human-Human Interaction (HHI)
In contrast to HCI, in HHI experiments it is not possible to approach the thresholds
of all interacting subjects simultaneously, since the subjects share the same delay values
calculated on the response basis of only one subject. Therefore, in all HHI experiments
using adaptive methods, one has to assign a determining subject whose threshold is fi
nally determined. The remaining subjects contribute only with their corresponding rat
ings. For this reason, the statistical power is not that high as it could be expected from
the chosen number of recruited subjects. In order to include all available information, we
therefore applied the curve fitting procedure already applied in the HCI experiments, i.e.
we fitted an assumed logistic model to the data by means of the method of least squares.
From the obtained psychometric function, the desired thresholds can be read out.
Basic Auditory and Visual Interaction
With the curve fitting procedure we found a threshold of 196 ms for auditory, and of
204 ms for visual interaction. These values agree (i.e. are within the 95 % confidence
level) with the threshold values obtained from the best-PEST procedure including only
half of the subjects (216 ±44 ms for auditory, 237 ±92 ms for visual interaction). Reca
pitulating the results from the basic interaction task we found an absolute delay threshold
of about 200 ms for both auditory and visual interactions. We should bear in mind, that
92 CHAPTER 5. DISCUSSION AND CONCLUSIONS
this value is a difference threshold DL, on the basis of the build-in delay of 130 ms plus
the reaction time of the subjects, which is about 190 ms (Brebner, 1980). The suggested
value has to be understood in the following way: When confronted with an absolute de
lay of 320 ms, 50 % of the subjects were able to detect an additional delay of 200 ms.
These results are in line with the results from the HCI experiments (see also
(Zuberbiihler et al., 2003», where we investigated the absolute delay between vocal input
and delayed visual computer-generated response (condition VoiVis). This absolute delay
is 115 (±23) ms, or approximately half of the present value. This makes sense, since in
HCI experiments the subjects were not confronted with human interaction partners, and
had therefore not to consider the ambiguous (and fluctuating) human reaction time.
In contrast to the findings in HCI experiments, the logistic model accounts in this case
also for small delays (i.e. the standardised slopes are greater than 1.7). For this reason we
can quote percentages of users detecting small delays (see Table 19).
Table 18 Parameter values obtained from the resulting psychometric function ofthe absolute delay experiments in HHI. These values inserted in eq (18) (page 86) leadto the values listed in Table 19.
AudBas ,,--(J 196 204
Ps 3.316 3.162
5.2 Regarding Absolute Delays
Table 19 Absolute delay values perceived by particular percentages of users.Reading example: It can be expected that up to 75 % of users will detect an absolute delay of 228 ms in auditory HHI. And up to 75 % of users will detect an absolute delay of239 ms in visual HHI.
93
Percentages of Users DetectingAbsolute Delays in HHI
Vtl%]5
10
25
33
50
67
75
90
95
Realistic Tasks
AAbsolute. Delay
;[ms]
109
131
164
175
196
217
228
261
283
..... - - ...
;[ms]
109
133
169
181
204
227
239
275
299
As we have seen from the basic interaction task, involving two or more people in in
teractive tasks almost doubles perceived absolute delays. This eff<..:ct becomes even more
striking, when the involved persons solve a realistic task, instead of a task designed to fa
cilitate the perception of absolute delays. For the task of free Jiscussion about a familiar
topic, we found a perception threshold of over 1200 ms, and an acceptance threshold of
almost 2100 ms (see Figure 27 on page 81). Since there was no evidence from other stud
ies to support such high values, we have dimensioned the experimental set-up only for a
maximal delay of 2800 ms. In the course of the :..:xpt:riment, we noticed, that several sub
jects did not even detect such a high delay, anc' a greater number of subjects did not find
it disturbing.
Such circumstances make the adaptive )rocedure best-PEST difficult to apply, since
after a few non-detections of the highesttimulus the algorithm requires a huge number
of detection trials in order to return to tht testing range. Hence we did not pursue further
adaptive procedures, but instead presented particular delay values in a random order and
recorded the respective ratings. As a cons,.;quence of this proceeding we were no more
able to determine individual thresholds, and thus cannot quote a confidence interval. In-
94 CHAPTER 5. DISCUSSION AND CONCLUSIONS
stead we applied the model-fitting procedure described earlier to obtain the threshold
values, and the psychometric functions depicted in Figure 27. They show two things:
• The perception and the acceptance functions are relatively flat signifying
that either there exists no sharp thresholds (in this case one might discuss
whether the term threshold is appropriate in this context), or there are great
slope and threshold variances, i.e. some subjects have very good time dis
crimination skills, while others have moderate to poor. Due to qualitative
observation of the subjects during the test we tend to favour the latter ex
planation.
• The not-standardised slopes of the perception and the acceptance functions
are essentially of the same size (1.02 versus 1.06). This fact may indicate that
two similar, linearly interconnected mechanisms are involved in perceiving
and rating absolute delays. It is understood that this hypothesis must be con
firmed by further experimentation.
Our finding that the perception threshold is much greater in the realistic than in the
basic task, as well as inconsistent threshold figures found in the literaturer suggest three
conclusions:
• Perception and acceptance of absolute delays are very much task-dependent.
Therefore it is probably not helpful to recommend universal threshold val
ues, rather they should be suggested for different task categories.
• The choice of value ranges is not a simple task and should be kept as a busi
ness strategy of the service provider.
• The main difference between the realistic and the basic task concerns the de
gree cif interactiviry. Whereas in the basic task this variable is assumed to be at
maximum, it is at a considerably lower level in the realistic task, since the
subjects spend more time studying the documents. Thus, the degree cif interac
tiviry could act as the variable upon which particular tasks (and communica
tion settings) can be classified. We consider the degree cif interactivity as the
sum parameter including some of the verbal interaction parameters sug-
r Bouch for instance suggests a value no greater than 400 ms (Bouch et aI., 2000b), whereasIsaacs and Tang suggest a delay of between 640 ms and 840 ms to be acceptable (Isaac et aI.,1994) (the three figures refer to roundtrip delay).
5.2 Regarding Absolute Delays
gested by O'Conaill et al. (1993): Backchannels, interruptions, explicit handovers
and number rf turns.
95
In order to fmd reasons for the unexpectedly high perception and acceptance thresh
olds in the audio-visual realistic task, we ran an experiment with the same task, applying
the audio channel only. With this condition we found a perception threshold of 800 ms,
and an acceptance threshold of 1690 ms. These two values are within the 95 % confi
dence interval of the threshold means obtained with the best-PEST procedure (970
(±330) ms for perception, and 1760 (±410) ms for acceptance). The fact, that thresholds
in the audio-only condition are well below the thresholds in the audio-visual condition
suggest two possible explanations:
• The visual channel in an audio-visual application acts as a distractor. I.e. the
focus of attention is divided into parts for the audio, and parts for the visual
channel. Since the audio channel apparendy suffices to execute the chosen
task, the additional visual information does not yield additional clues for de
tecting delays, far from it, it hampers the detection of delays. This does not
mean that the visual channel does not yield usable information. But it seems
that the gain of 'media richness' in audio-visual communication has to be
paid by a loss of focussed perception.
• The use of videoconferences (VC) is still unfamiliar to the users acting as
experimental subjects, whereas audio-only conversation is not: Since teleph
ony is very common for users, they are well-trained to perceive and evaluate
situations differing from the ones considered normal. This is not the case in
VC, insofar as the subjects do not have a point of reference to compare the
experimental situation withs•
The fact that, in the audio-only condition, perception and acceptance thresholds are
still above the values found in the literature suggest the following explanation:
• Three subjects participated in our experiment, whereas only two were em
ployed in experiments described in the available literature. The higher
threshold of our experiments could signify that additional conversation
s This explanation resembles in some respects the reasoning, that computer-mediated real-timecommunication is assumed to be compared to the reference point of the natural face-to-facecommunication, whereas for other areas of computer-mediated communication (i.e. browsingthe WWW), such a reference point does not exist (see also section 2.2, Scope of Investigation).
96 CHAPTER 5. DISCUSSION AND CONCLUSIONS
members act as additional distractors. I.e. the focus of attention is divided
into all members. A further reason could be that - since the videoconfer
ence does not support gaze awareness - the members are not sure when
they are addressed. This slows down the degree of interactivity, and thus the
perceived absolute delays.
Table 20 Parameter values obtained from the resulting psychometric function ofthe absolute delay experiments executing a specific realistic task. These values insertedin eq (18) (page 86) lead to the values listed in O.
~erception
AudVisReal AudReal AudVisReal AudReal
8 1220 800 2080 1690
Ps 0.889 1.06 1.58 2.32
5.3 Further Research
Table 21 Absolute delay values perceived by particular percentages of users.Reading example: It can be expected that not more than 33 % of users will detect an absolute delay of 734 ms when interacting audio-visually, and 535 ms when interactingonly auditory. And not more than 33 % of users will ftnd an absolute delay of 1610 msdisturbing when interacting audio-visually, or 1430 ms when interacting only auditory.Note that these values count only for the chosen task.
97
; [ms] ; [ms] ; [ms] ; [ms]
n.a. n.a. n.a. 617
n.a. n.a. 629 889
466 386 1350 1290
734 535 1610 1430
1220 800 2080 1690
1710 1070 2550 1940
1970 1220 2810 2090
2730 1640 3530 2480
3240 1920 4030 2750
Percentages of UsersDetecting or AcceptingAbsolute Delays in HHI
"rh]5
10
25
33
50
67
75
90
95
Perception of Absolute Delay
AudVisReal AudReal
.. , - • ft .1... .,.,....;, ,
AudVisReal AudReal
Considering the values of the standardised slopes in Table 20, it appears that the val
ues for only the acceptance function in the realistic audio-only task were greater than the
suggested value of 1.7. That is why in Table 21, for the other conditions, no delay values
are listed for small user percentages.
5.3 Further Research
This thesis shows several areas, where further research concerning perception and ac
ceptance of delays in multimodal real-time communication, as well as human time per
ception is needed. In the following future research needs are divided into relative and ab
solute delays.
98
5.3.1 Relative Delay
CHAPTER 5. DISCUSSION AND CONCLUSIONS
Although figures of perception thresholds of relative delays between auditory and vis
ual stimuli have been suggested by several authors (Dixon et al., 1980; Summerfield,
1992; Lewkowicz, 1996; Steinmetz, 1996), only a few of them adopted psychophysical
procedures in their study designs. While one might argue that this is not necessary, since
the suggested values are working well in practice, it is nevertheless of interest to verify
these values with different study paradigms, and for different contexts.
Furthermore, questions concerning the different asynchrony perception for different
modality orders are predominately answered by intuitive explanations. A consistent
model of multimodal stimulus processing is still absent. In this area, the upcoming medi
cal imaging techniques present a promising means to investigate questions concerning
multimodal stimulus integration in humans. They could provide deeper insight, why e.g.
AV-stimuli are detected easier than VA-stimuli.
5.3.2 Absolute Delay
Since perception and acceptance of absolute delay is strongly task-dependent, we sug
gested the degree rf interactivity to act as the variable upon which tasks and communication
settings can be assessed in terms of their delay-sensitive impact. The usefulness of this
variable must be verified. If it should turn out to be appropriate, further work has to be
done aiming to classify the abundance of relevant communication settings. Once the
communication settings evoking same degrees of interactivity are pooled, further experi
ments must be conducted with some representative communication settings. The goal of
such a proceeding is to obtain psychometric functions for particular interactivity rates.
Since interactivity is considered a parameter, which can be continuously measured in
networked services, it should be possible to adjust delay values according to the meas
ured degree of interactivity. Having knowledge of the appendant psychometric function,
the delay can be set according to a predefmed (or negotiated) percentage of users per
ceiving, or accepting this particular delay.
On a more theoretical side, further research is needed for modelling time perception.
Although a lot of work has already been done in different research areas, it is still to dis
cover, which neurological mechanisms are responsible for conscious time perception. As
a consequence, for the time being, it remains an open question, which distribution time
5.3 Further Research 99
perception follows. And most notably it is not understood, how contextual factors, such
as attention, arousal, modality, mood, age, and intelligence influence the conscious esti
mation of durations.
Seite Leer /Blank leaf
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Annex
Developed Software: The best-PEST Calculator
As a methodical outcome of the threshold experiments, we advanced the used best
PEST method to a fully independent, browser-based application. The idea was to pro
vide experimenters with a tool for measuring thresholds, which can be used without
spending any installation, compilation, or even programming effort (this is in contrast to
other available software). The drawback of this premise lies in the missing interface. For
security reasons the program has no access to the client computer and therefore cannot
provide it with the estimated values direcdy. The experimenters have to insert the re
ceived threshold values in their testing environment by hand. This fact makes the best
PEST Calculator useful especially for these threshold estimations, whose stimulus pres
entation cannot be done with the aid of common computer-equipment, like e.g. smell
and taste thresholds. This manual and the program can be downloaded from the follow
ing internet address, also quoted in Zuberbiihler (2002):
http://www.psychophysics.ethz.ch/tools/
Depending on the version used, the browser has to be updated with the Macromedia
Director plug-in version 8.5. The software recognises automatically if an update is neces
sary, whereupon it will be done within three or four mouse-clicks.
In the following the best-PEST Calculator is described. This description can also be
downloaded from the above-mentioned link.
102 ANNEX
Description
In the following Figure 30, Figure 31, and Figure 32, screenshots of the three masks of
the program are shown and the input and output fields are explained where they are not
self-explanatory (indicated by numbers).
Settl n 5
Forced-~~~;~~=~;~~~d~~~-(~~F~)-···---·::!j CD
@@
;::.:=::.:::==:====.::=====:::.::.:.:.==::.:.:.=::======.,4 f4\Number of trials \::!/
1) 10.06-..·----.--.-------
0.08
®®CV
.. _,_ _ _ 3 __ _._ _ _. .. .__ --{ ®
Figure 30 Screenshot of the first mask (input), where the settings for the experiment are entered. If all the fields are filled out in the requested format, pressing the'start' button will lead to the second input mask. If not, a dialogue window pops up, indicating the missing or false input. Clicking the arrow opens the 'advanced settings'
fields. By default these settings are: 'slope W= 2, 'false negative 0' = 0, 'false positive E'
= 0, 'mean of x trials' = 3.
CD ModeIn the drop-down menu mode, the users have the choice between the yes-no and the forcedchoice (nAFC) paradigm. If they choose nAFe, an additional input field appears, where thenumber of alternatives n is to insert. If n > 100 is entered, the program switches auto-
Developed Software: The best-PEST Calculator 103
matically to theyes-no calculation mode. It is to state that experimental subjects most likelywill be overstrained if they have to make repeated decisions about the presence of astimulus from more than a hundred alternatives. Anyhow, if such experiments areplanned, one can expect the error caused by the slightly inadequate calculation beingmuch smaller than the error caused by any other interference - for instance the subject'slapses.
eq (19)
{l E lR Il ~ o}
{kElRlk>O}k: stimulus maximum
CID Start value kSetting of the test interval [0, k], where k determines the highest stimulus value that can beobtained during the run. The upper limit k should be at least twice as large as the expectedthreshold value. Note that the start value will not be presented to the subject, assumingthis value is so high that subjects will perceive it in all the cases. In order to deal withcomparable slope values, the algorithm uses the normalized range [0, 1] of the stimulusintensity. The stimulus intensity fjJ denotes therefore:
lfjJ=k
fjJ*: stimulus intensity in desired unit
@ Smallest Step SizeDetermines the size of the smallest stimulus change that can be obtained. Ideally this isthe difference threshold of the particular stimulus. If this value is not known - in the casewhere we just want to determine it - we have to estimate a suitable step size. Experimenters need to be aware of step sizes that are too small or too big, since both result in largemeasurement bias of the thresholds. If the ratio between 'start value' and 'smallest stepsize' is larger than 1000, the program will prompt a warning and ask for either a biggerstep size or a smaller start value. This is a precautionary measure to prevent lengthy computing times.
cv Termination CriterionUsers have the choice between 'Number of Trials' and 'Number of Reversals'. A reversalR is defined as a change from increasing to decreasing (01 the other way around) of thepresented stimulus intensities M.
M ={m E lR I m is presented at trial i} eq (20)
104
R ={mj E M I (mi-l > mi < mj+l) v (mi-l < mi > mi+l )}
M: set of presented stimulus intensities
R: set of reversals
ANNEX
eq (21)
@ Advanced Setting: Slope pAs an advanced setting, the users have the opportunity to enter the estimated or knownslope of the particular psychometric function. For the definition of the slope see Figure14 and eq (7). The slope value is calculated according to equal-scaled axes. Entering pimplies knowledge about the tested cohort or subject, usually gained through pre-testing. Ifthe slope is not known, Pwill be set by default to two.
@ Advanced Setting: false negative b8 specifies the false negative rate (or miss rate). This rate is constituted by the observers'negative answers even though the stimulus intensity is at maximum. Entering 8 impliesknowledge about the tested cohort or subject, usually gained through pre-testing. By default this value is zero.
(J) Advanced Setting: false positive E
£ specifies the false positive rate (or false alarm rate). This rate is constituted by the observers' positive answers even tough the stimulus intensity is zero. In forced-choice experiments, £ does not comprise the methodical false alarm rate, which is the reciprocalvalue of the number of alternatives. Entering £ implies knowledge about the tested cohortor subject, usually gained through pre-testing. By default this value is zero.
® Advanced Setting: Mean of x Trialsx specifies the number of trials to take at the end of an experimental run for calculatingthe mean threshold value. As a rule-of-thumb, larger numbers of trials permit larger numbers of x. By default this value is three.
Developed Software: The best-PEST Calculator
Next value to present to the sUbject is
110
The subject's response was
CORRECT INCORRECT@ 0
....................................
Calculate next value
Figure 31 Screenshot of the second mask (input/output), where the computationof the actual maximum likelihood threshold is done. Pressing the button 'back' willabort the computation and returns to the ftrst mask to modify the settings. Pressing the'cancel' button will abort the computation and switches the program to the results maskwhich displays the recent status of the experiment, without having reached the termination criterion.
105
®
®
CID Step 1: Output from the best-PEST algorithmThe output value mi is to be presented to the subject. This value is the maximum likelihood estimation of the threshold, obtained from all available information. Since there isno information available from the subjects in the first trial, the initialisation is conductedassuming that 100% of the subjects will perceive the stimulus at the start intensity k, andthat at zero intensity they will be certain not to perceive the stimulus. Therefore the firstoutput will fall somewhere in the middle of the test interval.
106 ANNEX
@) Step 2: Response of the subjectAfter the subjects were presented with the stimulus intensity obtained from step 1, the radio button is to select corresponding to the subject's response. In the nAFC mode thebuttons are labelled with 'CORRECT' and 'INCORRECT', and in the yes-no mode theyare labelled with 'YES' and 'NO'.
@ Step 3: next valuePressing the button 'calculate next value' will trigger the next calculation, whereupon anew value will appear in the output field. Steps 1 to 3 have to be repeated until the termination criterion is reached. Pressing this button will bring the program to the 'results'mask.
R e 5 U 5
Threshold is at
103 @
@Values
stimulusintensity220 •\
"
198 ...
\
17S \,"
\.154\
132 '. @110\ ...~ ,~,-
...-".,-"-- .....,,""'---..........--.. threshold\. .'
0·-.. -.::,," ......../ .. '9==::::::¥
88 ", ,.. .......
"-SS .......~.... !
44
22
o 0 2 3 4 5 S 7 8 9 10 11 12 13 14 15 is -number "'trials
Figure 32 Screenshot of the third mask (output), where the results of the entireexperimental run are displayed. Pressing the 'start again' button will return the programto the first mask, and leave the settings as they are.
Developed Software: The best-PEST Calculator
@ Threshold valueOutput of the final threshold estimation, which is the mean value of the x last trials.
107
@ All valuesThe presented stimulus intensities of the entire experimental run are displayed andmarked in the field 'values' in order to copy them to the clipboard (Ctrl + C).
@GraphThe values of the entire experimental run as well as the final threshold are shown in a diagram with stimulus intensity as ordinate and number of trials as abscissa.
Monte-Carlo-Simulations
The following Monte-Carlo-Simulations were made to evaluate the convergence be
haviour of the best-PEST algorithm. All simulations were made in theyes/no mode with
equal start values. A built-in random process simulated the response behaviour of an as
sumed experimental subject, which we call stochastic obseroer. For that purpose we assumed
that the stochastic observer answers in a logistic manner with a stable threshold - an as
sumption that is in fact made by best-PEST:
...........According to eq (17) (page 62), ON is the n-th threshold estimate accomplished by
best-PEST. For this estimate there is - according to eq (8) (page 57) - a probability
'1/(0,:;) for a positive response. We obtain the p;::ticular answer of the stochastic ob
server by applying the following procedure: If '1/(ON ) is greater than a jointl0stributed
random number between 0 and 1, the stochastic observer answers no, if '1/(ON ) is equal
or smaller than the random number, the stochastic observer answersyes. That way, after a
sufficient number of runs we map the outcome of the best-PEST procedure onto the as
sumed psychometric function of the stochastic observer, and perhaps an empirical law of
the algorithm's behaviour can be established.
In the following we show the results of three simulation runs. Table 22 lists the corre
sponding parameter settings for the conducted simulations, whose results are displayed in
Figure 33, Figure 34, and Figure 35.
108 ANNEX
Table 22 Parameter settings used for the Monte-Carlo-Simulations separated forthree conditions. For an explanation of the parameters see the previous chapter.
Rarameter ... ..•"••"'v if:
Figure 33 Figure 34 Figure 35Mode yes/no Ives/no yes/noStart value k 1.7391 1.7391 1.7391Threshold {} of the stochastic observer 1.0000 1.0000 1.0000Start value k / smallest step size 40 40 40Termination criterion: Number of Trials 15 5to 50 50Slopes of best-PESTs model 1.0 to 3.5 0.1 to 5.0 0.1 to 5.0Slopes of the stochastic observer's psychometric func- same steps same steps 0.1 to 5.0tionFalse negative 8 0 0 0False positive £ 0 0 0Mean of x trials 3 3 3Number of threshold determinations per measuring 3 1000 1000pointNumber of measuring points 2500 2500 2500
In order to gain an idea of the accuracy the best-PEST algorithm provides, we ran a
simulation with realistic parameter settings: As a trade-off between accuracy and practi
cability, the simulated subject accomplishes three threshold determinations consisting of
15 stimulus presentations with corresponding decision-making. As such the whole pro
cedure corresponds to the real time of approximately 30 minutes, which is of course de
pendent on the duration of each stimulus presentation. With such a scenario, experi
menters can be sure that the subjects' fatigue will play a negligible role. For the simula
tions, we ran the above-mentioned scenario with slopes from 1.0 to 3.5, resulting in a to
tal of 2500 threshold means. The histogram of this distribution can be seen in Figure 33.
Developed Software: The best-PEST Calculator
Il2Jfrequencyl
", - .,..t'I.A.n.~ rnB_~_~_
109
0.7 0.8 0.9 1.0 1.1threshold (target value=1.0)
1.2 1.3
Figure 33 Distribution of the obtained threshold values with the best-PEST algorithm. The stochastic observer's threshold is 1.0 (target value). Basis for the distributionare 2500 threshold determinations, each representing the mean of 3 runs.
The distribution is approximately Gaussian with a mean of 0.99755, and a variance of
0.00764.
The aim of the second simulation was to gain insight to the convergence behaviour of
best-PEST for different numbers of trials until termination, and for different slope values
of both stochastic observer and best-PEST model. For that purpose we calculated the
variance of the mean threshold after 1000 runs as a function of the mentioned variables.
The contour lines of equal variance in the range lO, 0.05] can be seen in Figure 34.
110 ANNEX
504540
--~---t-~...._...~~~~~I I I
0.5 L-----l._--l...._-"--_..J.-_l-----J._-..L-_--J...=.:..;:.;;.....J
5 10 15 20 25 30 35Number of Trials
1.0
.....o~2.0c..oen 1.5
Figure 34 Simulation of threshold determination with the best-PEST algorithm.The curves show contour lines of threshold variances up to 0.05. The number of trialsuntil a threshold determination stops is on the abscissa; the slopes of the psychometricfunctions of both stochastic observer and model are on the ordinate. The variance iscalculated on the basis of 1000 threshold determinations for each measuring point. Theslope's increment is 0.1; the number of trials' increment is 1.
The equal variances of the mean threshold describe approximately exponential curves,
which is coherent with the interpretation that increasing number of trials diminishes the
marginal utility. This interpretation is obvious when we consider the nature of the best
PEST procedure: the information increase relative to the existing information is decreas
ing with every additional trial, and therefore successive threshold estimations approach
the true threshold values. A further prediction that can be made from these data is that
the number of trials plays an important role only for large slopes of the psychometric
functions.
The third simulation was made in order to analyse the convergence behaviour of best
PEST for different, interdependent slope values of the observer's and of the model's
psychometric function. For that purpose we calculated - as in the second simulation -
Developed Software: The best-PEST Calculator 111
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0Slope of Model
the variance of the mean threshold after 1000 runs as a function of the two slope vari
ables. The contour lines of equal variance in the range [0, 0.05] can be seen in Figure 35.
4.5
4.0
~3.5
~:g'3.0en'+-
~ 2.5c..o
en 2.0
1,0~~~~~m~i0.5
0.1 0.5 1.0
Figure 35 Simulation of threshold determination with the best-PEST algorithm.The curves show contour lines of threshold variances up to 0.05. The slope of themodel is on the abscissa; the slope of the stochastic observer is on the ordinate. Thevariance is calculated on the basis of 1000 threshold deterrninations for each measuringpoint. The increment is 0.1 for both variables.
On first sight the curves of equal variance indicate no reasonable and explainable
model of the interdependent behaviour of the two slope parameters. It can be read out
that there is no reason to choose much bigger model than observer slopes, since they in
crease the variance for a given observer slope, especially in its lower range. As a rule of
thumb, we can say, that a model slope twice as big as the observer slope will provide best
results, since it seems, that there is a relative minimum at each of the contour lines at
these points.
Seite Leer /Blank leaf
References
Alfano, M. (2000). QUASIMODO -Quality rifSef7Jice Methodologies and solutions within thesef7Jiceframework: measuring, managing and chargingQoS: EURESCOM: European Institute for Research and Strategic Studies in Telecommunications.
Armitage, G. (2000). MPLS: The Magic Behind the Myths. IEEE Communications Maga=\?"ne, 38(1), 124-131.
Baird, J. c., & Noma, E. (1978). Fundamentals rifscaling andp[Ychophysics. New York:Wiley.
Bales, R. F. (1955). How people interact in conferences. Scientific American, 3-7.
Bales, R. F. (1999) . Social interaction {Jstems: Theory and measurement. New Brunswick:Transaction Publishers.
Block, R. A., & Zakay, D. (2001). Internal Clocks and the Representation of Time.In C. Hoed & T. McCormack (Eds.), Time and Memory - Issues in Philosophy andP[Ychology (pp. 59-76). Oxford: Oxford University Press Inc.
Boltz, M. G. (1994). Changes in internal tempo and effects on the learning and remembering of event durations. Journal rifExperimental P{Jchology, 20, 1154-1171.
Bouch, A., Bhatti, N., & Kuchinsky, A. J. (2000a). Quality is in the rye rifthe beholder:Meeting users' requirementsfor InternetQuality rifSef7Jice. CHI'2000, Hague.
Bouch, A., Sasse, M. A., & DeMeer, H. (2000b). OfPackets and People: A User-CentredApproach to Quality ofSef7Jice. IWQoS 2000, Pittsburgh, PA.
Braun, A. (2003). Qualitiitsaspekte multimodaler Kommunikation: Subjektive und objektiveMessungen. PhD thesis, Swiss Federal Institute of Technology, Zurich.
Brebner, J. T. (1980). Reaction Time in Personality Theory. In A. T. Welford (Bd.),Reaction Times (pp. 309-320). New York: Academic Press.
114 REFERENCES
Brebner, J. T., & Welford, A T. (1980). Introduction: An Historical BackgroundSketch. In A T. Welford (Bd.), Reaction Times (pp. 1-23). New York: AcademicPress.
Brown, S. W. (1995). Time, change, and motion: The effects of stimulus movementon temporal perception. Perception & P!Jchophysics, 57, 105-116.
Buonomano, D. V., & Karmarkar, U. R. (2002). How Do We Tell Time? The Neuroscientist, 8(1),42-51.
Carr, C. E. (1993). Processing of temporal information in the brain. Annual Review 0/Neuroscience, 16,223-243.
Celesia, G. G., & Puletti, F. (1971). Auditory input to the human cortex during statesof drowness and surgical anesthesia. Electroencephalography and Clinical Neurophysiology, 31, 603-609.
Chen, T. M., Walrand,J., & Messerschmitt, D. G. (1989). Protocols for PacketVoice. IEEE Selected Areas in Communication.
Church, R. M. (1984). Properties of the internal clock. Annals 0/the New York Academy 0/Sciences, 424, 566-582.
Church, R. M., Meek, W. H., & Gibbon, J. (1994). Application of scalar timing theory to individual trials. Journal o/Experimental P!Jchology: Animal Behaviour, 20, 135155.
Clark, V. P., Fan, S., & Hillyard, S. A (1995). Identification of early visual evokedpotential generators by retinotopic and topographic analyses. Human Brain Mapping, 2, 170-187.
Clark, V. P., & Hillyard, S. A (1996). Spatial selective attention affects early extrastriate but not striate components of the visual evoked potential. Journal 0/CognitiveNeuroscience, 8, 387-402.
Coffman, K. G., & Odlyzko, A. M. (1998). The size and growth rate of the internet.IIII/!:! I IIIJJ 'w. die. UJJlJl. cduI ~ot!1v:7ko It!oc!iJllcmcl. Jl>c./idt:
j _ 4: ~j r.
Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desynchrony. Perception, 9, 719-721.
Fieandt, K., Huhtala, A, Kullberg, P., & Saarl' K. (1956). 1ersonal tempo andphenomenaltime at different age levels. (2). Helsinki: University of Hllsinki.
Fluckiger, F. (1995). Understanding networked multimedia: applications and technology. London: Prentice Hall.
REFERENCES
Foxe,].]., & Simpson, G. V. (2002). Flow of activation from V1 to frontal cortex inhumans: a framework for defining 'early' visual processing. Experimental Brain Research.
Fraisse, P. (1964). The p.rychology oftime. London: Eyre and Spottiswoode.
Galambos, R., Makeig, S., & Talmachoff, P.]. (1981). A 40-Hz auditory potential recorded from the human scalp. Proceedings ofthe NationalAcademy ofSciences, 78,2643-2647.
Galton, F. (1899). On instruments for (1) testing perception of differences of tintand for (2) determining reaction time. Journal ofthe Anthropological Institute(19), 2729.
Gescheider, G. A. (1997). P.rychophysics: The Fundamentals (3 ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Giard, M. H., & Peronnet, F. (1999). Auditory-Visual Integration during MultimodalObject Recognition in Humans: A Behavioral and Electrophysiological Study.Journal ofCognitive Neuroscience, 11(5),473-490.
Gibbon,). (1992). Ubiquity of scalar timing with a Poisson clock. Journal ofMathematical P.rychology, 36, 283-293.
Gibbon,)., Church, R. M., & Meek, W. H. (1984). Scalar timing in memory. Annals ofthe New York Academy ofSciences, 424, 52-77.
Goldstone, S., & Lhamon, W. T. (1974). Studies of auditory-visual differences inhuman time judgment: 1. Sounds are judged longer than lights. Perceptual and Motor Skills, 39, 63-82.
Gonsalves, T. (1989). Comparative Performance of Voice/Data Local Area Networks. IEEE Selected Areas in Communication.
Guttormsen Schar, S., Arial, M., Zuberbiihler, H. J., & Krueger, H. (2002). DistributedCo-operative Design Systems: supporting Human Factors with 'Communicate-It'. 28th Annual Conference of the IEEE Industrial Electronics Society, Sevilla, Spain.
Helder, G. K. (1966). Customer Evaluation of Telephone Circuits with Delay. BellSystem TechnicalJourna4 38(9).
Hershenson, M. (1962). Reaction time as a measure of intersensory facilitation. Journal ofExperimental P[Ychology, 63,289-293.
Hirsh, 1. J., & Sherrick, C. E. (1961). Perceived order in different sense modalities.Journal ofExperimental P.rychology, 62,423-432.
115
116 REFERENCES
Isaac, E., & Tang,]. (1994). What video can and can't do for collaboration: a casestudy. Multimedia Systems, 2, 63-73.
Jokeit, H. (1990). Analysis of periodicities in human reaction times. Natunvissenschaften, 77, 289-291.
Kohfeld, D. L. (1971). Simple reaction time as a function of stimulus intensity indecibels of light and sound. Journal o/Experimental P[Ychology, 88,251-257.
Kouvelas, 1., Hardman, V., & Watson, A. (1996). Lip Synchronisation for Use Over theInternet: Ana!ysis and Implementation. IEEE Globecom'96, London UK.
Krueger, H. (1994). Wahrnehmung und Be.ftndlichkeit ins richtige Lichtgeseli!. 11. Gemeinschaftstagung der Lichttechnischen Gesellschaften der Schweiz, Deutschlands,der Niederlande und Ostereichs, Interlaken.
Kiindig, A., Zuberbiihler, H. J., & Braun, A. (2001). QoS User Expectations: State 0/theArl- Kry Parameters - their Relevance and their Determination (QED-R-2). ZUrich:ETHZ / TIK, IHA.
Kurmann, H. (1997). On the Emulation o/Impairments inATM-Networks. PhD Thesis,Swiss Federal Institute of Technology, Zurich.
Lejeune, H. (1998). Switching or gating? The attentional challenge in cognitive models of psychological time. Behavioural Processes(44), 127-145.
Lewkowicz, D. J. (1996). Perception of auditory-visual temporal synchrony in humaninfants. Journal o/Experimental P[Ychology: Human Perception and Performance, 22(5),1094-1106.
Limpert, E., Stahel, W. A., & Abbt, M. (2001). Log-normal distributions across thesciences - keys and clues. BioScience, 51, 341-352.
Longuet-Higgins, H. C. (1968). Holographic model of temporal recall. Nature, 217,104.
Madler, c., & Poppel, E. (1987). Auditory evoked potentials indicate the loss of neuronal oscillations during general anasthesia. Natunvissenschaften, 74,42-43.
McDonald, J. J., & Teder-Salejarvi, W. A. (2000). Involuntary orienting to sound improves visual perception. Nature(407), 906-908.
McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audiovisual speech recognition by normal-hearing adults. J. Acoust. Soc. Am., 77(2),678-685.
Miall, C. (1996). Models of neural timing. In M. A. Pastor & J. Artieda (Eds.), Time,Internal Clocks and Movement (pp. 69-94). Amsterdam: Elsevier Science B.Y.
REFERENCES
Miller,]. 0., & Low, K. (2001). Motor processes in simple, go/no-go, and choice reaction time tasks: a psychophysiological analysis. Journal ofExperimental P[Ychology:Human Perception and Performance, 27, 266.
Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C, Schroeder, C E., & Foxe,J. J.(2002). Multisensory auditory-visual interactions during early sensory processingin humans: a high-density electrical mapping study. Cognitive Brain Research, 14,115-128.
O'Conaill, B., Wittaker, S., & Willbur, S. (1993). Conversations Over Videoconference: an Evaluation of the Spoken Aspects of Video-Mediated Communications. Human-computer interaction, 8, 389-428.
Odlyzko, A. M. (2000). Internet Growth: Myth and Reality, Use and Abuse. iMP: Information Impacts Magazine(November).
Oviatt, S., & Cohen, P. R. (2000). Designing the User Interface for MultimodalSpeech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Human Computer Interaction, 15, 263-322.
Pandey, P. C, Kunov, H., & Abel, S. (1986). Disruptive effects of auditory signal delay on speech perception with lipreading. The Journal ofAuditory Research, 26, 2741.
Pentland, A. (1980). Maximum likelihood estimation: The best PEST. Perception &P[Ychophysics, 28(4), 377-379.
Poppel, E. (1971). Oscillations as possible basis for time perception. Studium Generale,24,85-107.
Poppel, E. (1978). Time Perception. In R. Held & H. Leibowitz & H.-L. Teuber(Eds.), Handbook ofSensory Physiology CV01. VIII: Perception, pp. 713-729). Berlin:Springer.
Poppel, E. (1986). Neuronal oscillations in the brain. Discontinuous initiations ofpursuit eye movements indicate a 30-Hz temporal framework for visual information processing. Natunvissenschaften, 77,289-291.
Poppel, E. (1994). Temporal Mechanisms in Perception. International review ofneurobiology, 37, 185-202.
Poppel, E. (1997a). Grenzen des Bewusstseins. Frankfurt am Main: Insel Verlag.
Poppel, E. (1997b). A hierarchical model of temporal perception. Trends in CognitiveScience, 1(2), 56-61.
117
118 REFERENCES
Ranta-aho, M., Wilkins, M., & Egloff, P. (1998). JUPITER -Joint Usabiliry, Performabiliry and Interoperabiliry Trials in Europe: EURESCOM: European Institute for Research and Strategic Studies in Telecommunications.
Rothlisberger, U. (1998). The Architecture ifan Interactive Multimedia Communication System. PhD thesis, Swiss Federal Institute of Technology, Zurich.
Ruesch, J., & Bateson, G. (1951). Communication: The SocialMatrix ifP.rychiatry. NewYork: W.W. Norton & Co.
Sanders, A. F. (1998). Elements ifHuman Performance: Reaction Processes and Attention inHuman Skill. Mahwah, New Jersey: Lawrence Erlbaum Associates.
Schwender, D. e. a. (1994). Anasthetic control of 40-Hz brain activity and implicitmemory. Consciousness and Cognition, 3, 129-147.
Short, J., Williams, E., & Christie, B. (1976). The socialp.rychology iftelecommunication.London: Wiley.
Smith, R. L., Richetto, G. M., & Zima, J. P. (1972). Organizational behaviour: an approach to human communication. In R. W. Budd & B. D. Ruben (Eds.), Approaches to Human Communication (pp. 269-289). New York: Spartan Books.
Steinmetz, R. (1996). Human Perception ofJitter and Media Synchronization. IEEEJournal on Selected Areas in Communications, 14(1),61-72.
Stern, L. W. (1897). Psychische prasenzzeit. ZeitschriJtfur Prychologie und Physiologie derSinnesory,ane, 13, 325-349.
Sternberg, S. (1966). High-speed scanning in human memory. Science, 153,652-654.
Stone, J. v., Hunkin, N. M., Porrill, J., Wood, R., Keeler, V., Beanland, M., Port, M.,& Porter, N. R. (2001). When is now? Perception of simultaneity. Proceedings Biological Sciences: The Rqyal Sociery, 268,31-38.
Stone, M. A., & Moore, B. C. (1999). Tolerable hearing aid delays. 1. Estimation oflimits imposed by the auditory path alone using simulated hearing losses. Ear andHearing, 20(3), 182-192.
Summerfield, Q. (1992). Lipreading and audio-visual speech perception. PhilosophicalTransactions ifthe Rqyal Sociery ifLondon, Series B: Biological Sciences, 335(1273), 7178.
Thomas, E. A. c., & Brown, 1. (1974). Time perception and the filled-duration illusion. Perception & P.rychophysics, 16, 449-458.
REFERENCES
Treisman, M., Faulkner, A., Naish, P., & Brogan, D. (1990). The internal clock: evidence for a temporal oscillator underlying time perception with some estimatesof its characteristic frequency. Perception, 19(6), 705-743.
Treutwein, B. (1995). Adaptive Psychophysical Procedures. Vision Research, 35(17),2503-2522.
Van Hoesel, R, Ramsden, R, & Odriscoll, M. (2002). Sound-direction identification,interaural time delay discrimination, and speech intelligibility advantages in noisefor a bilateral cochlear implant user. Ear and Hearing, 23(2), 137-149.
Vaughan, H. G., & Arezzo,J. C. (1988). The neural basis of event-related potentials.In T. W. Picton (Ed.), Human Event-related Potentials, Handbook ofElectroencephalograpf:y and Clinical Neuropf:ysiology (Revised Series ed., Vol. 3, pp. 45-96). Amsterdam: Elsevier.
von Steinbiichel, N., Wittmann, M., & Poppel, E. (1996). In M. A. Pastor & J.Artieda (Eds.), Time, Internal Clocks, and Movement (pp. 281-304): Elsevier.
Watzlawick, P., Bavelas, J. B., & Jackson, D. D. (1967). Pragmatics ofHuman Communication. New York: W.W. Norton Co.
Watzlawick, P., & Beavin, J. H. (1966). Einige formale Aspekte der Kommunikation.In B. Badura & K. Gloy (Eds.), S0::dologie der Kommunikation: Eine Textauswahl iJlrEinfiihrung. Stuttgart: Frommann.
Wearden, J. H., & Bray, S. (2001). Scalar timing without reference memory? Episodictemporal generalization and bisection in humans. The QuarterlY Journal ofExpert'mental Psychology, 54B(4), 289-309.
Wearden,J. H., Philpott, K., & Win, T. (1999). Speeding up and (... relatively...)slowing down an internal clock in humans. Behavioural Processes(46), 63-73.
Weidenmann, B. (1988). Psychische Prozesse beim Verstehen von Bildern. Bern: VerlagHans Huber.
Welch, R. B., & Warren, D. H. (1986). Intersensory interactions. In K. R Kaufman& J. P. Thomas (Eds.), Handbook ofPerception and Human Peiformance, Sensory Processesand Perception (Vol. 1, pp. 1-36). New York: Wiley.
Welford, A. T. (1980). Choice Reaction Time: Basic Concepts. In A. T. Welford(Ed.), Reaction Times (pp. 73-128). New York: Academic Press.
Wilkins, M., & Tuominen, J. (1998). Recommended Network Parameter Valuesfor Acceptability Tests: EURESCOM: European Institute for Research and Strategic Studiesin Telecommunications.
119
120 REFERENCES
Wilson, G., & Sasse, M. A. (2000). Do Users Always Know What's Good ForThem? Utilising Physiological Responses to Assess Media Quality. In S.McDonald & Y. Waern & G. Cockton (Eds.), People and Computers XIV - Usability or Else! Proceedings ifHCI 2000 (pp. 327-339). Sunderland, UK: Springer.
Witherspoon, D., & Allan, L. G. (1985). Time judgments and the repetition effectsin perceptual identification. Memory and Cognition, 13, 101-111.
Yamaguchi, H., Wada, M., & Yamamoto, H. (1986). A 64 kbit/s Integrated VisualCommunication System - New Communication Medium for the ISDN. IEEESelected Areas in Communication.
Zakay, D., & Block, R. A. (1998). New Perspective on Prospective Time Estimation.In V. De Keyser & G. Ydewalle & A. Vandierendonck (Eds.), Time and the Dynamic Control ifBehavior. Hogrefe & Huber.
Zuberbiihler, H. J. (2002). Rapid Evaluation of Perceptual Thresholds - The BestPest Calculator: A web-based application for non-expert users.IIt/p: //IJ'Il'WPD'c!/{)phl'JitJ.et!i:<;. cb / DolJ!J1/oadJ/EapEJ'iJl.pdt:
Zuberbiihler, H. J., Krueger, H., & Kiindig, A. (2003). Deltry Perception Thresholds inHuman-Computer Interaction: Fundamentalsfor CSCW-Applications. GfA - XVII International Annual Occupational Ergonomics and Safety Conference, Munich.
Zuberbiihler, H. J., Ruegg, S., Krueger, H., & Kiindig, A. (2002). Intermedia Synchronisation in Network Design: Using an Adaptive P{Jchophysical Method to Specify the Perceivable Audio-Visual Deltry. WWDU 2002 - Work With Display Units: World WideWork, Berchtesgaden.
Zwicker, E., & Feldtkeller, R. (1967). Das Ohr als Nachrichtenempfiinger. Stuttgart: S.Hirzel Verlag.
Glossary
2ZAFC ZAltemative-Forced-Choice. see Forced-Choice procedure.
AAbsolute Minimal detectable amount of stimulation.
Threshold
Application In our context, application describes what kind of processes the end user is trying to support when using services of a public network. This interpretation ispurposely wider than the meaning of application program running on some computer, e.g. application may also mean that a phone call is made for some specific purpose.
ATM Arynchronous Tran.ifer Mode: High speed packet switching technology usingsmall packets (cells) of fixed-size (48 data +5 header = 53bytes). ATM is alsoknown as fastpacket.
BBandwidth Technically, the difference, in Hertz (Hz), between the highest and lowest fre
quencies of a transmission channeL However, as typically used, the amount ofdata that can be sent through a given communications circuit.
Best-PEST see PEST
Bit rate Number of binary digits that the network is capable of accepting and delivering per unit of time.
BPS Bits per Second: A measure of the data transfer rate of the data channel
122
cCircuit
SwitchedMode
Client
Codec
Compression
CSCW
CtrlControl
DDifferenceThreshold
DVD
GLOSSARY
Operational mode of a telecommunication network where connections are setup from an end system A to any other end system B, with network resourcesreserved in the network for this connection along a fIxed path. Within network nodes, a very low delay link is dedicated to each connection, and a fIxedbandwidth (bit rate) is reserved on each link participating in a connection.
A computer system or program, which communicates with another suchwhich provides special services (e.g. a workstation requesting the contents of aflie from a ftle server is a client of the flie server).
Beginning and end point of a videoconferencing system. Codec is an acronymfor compression decompression, compressor decompressor, or coder decoder. A codec compresses its video and audio input using computed algorithms. The compressed signal is adapted for transmission over a particularnetwork.
Mapping sets of bits produced by a source into a smaller number of bits to betransmitted. With compression, the original information content may be retained (so-called lossless compression) or reduced (so-called lossy compression). At the receiving side, suitable decompression algorithms restore theoriginal information as far as feasible.
Computer-Supported Cooperative WOrk applications enable real-time collaborationamong geographically-distributed work group members. They typically includeflie transfer, chat, shared whiteboard, application sharing, voice, and video.
A key on a terminal or computer keyboard which modifIes the effect of other(letter, number and some other) keys - in a similar way that the Shift keymakes letter keys generate capital letters
Smallest detectable difference between two stimuli, the just noticeable difference (also calledjnd or Differenz Limen DL)
Digital Versatile Disc is an optical disc technology that holds 4.7 gigabyte of information on one of its two sides, or enough for a 133-minute movie. Withtwo layers on each of its two sides, it will hold up to 17 gigabytes of video,audio, or other information. DVD uses th-.: MPEG-2 flie and compressionstandard.
GLOSSARY
FForced
Choice Procedure
H
123
The observer is given two or more observation intervals, one of which contains a signal. The observer is required to choose which observation intervalcontained the signal.
HCI Human-Computer Interaction is a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use andwith the study of major phenomena surrounding them.
HHI In our context, Human-Human Interaction concerns information exchange between two or more users, over an intermediary computer and/or communication network.
IIEEE Institute rf Electrical and Electronics Engineers (US): Professional society, which
sets standards.
Internet The global collection of interconnected regional and wide-area networks,which use IP as the network, layer protocoL
IP Internet Protocol: The network layer, which describes a packet format for data topass on a TCP/IP network and on the Internet. It is a connectionless, besteffort packet switching protocoL
ISDN Integrated Se17Jices Digital Network: A switched digital network operating in circuit-switched mode. International standard for digital phone and other serVlces.
LLAN Local Area Network: A network spanning a small physical area (e.g. building or
campus) and operating at high speed (typically 10 - 100Mbit/sec)
Layer Communication networks for computers may be organized as a set of, moreor less, independent protocols, each in a different layer (or level). The lowestlayer governs direct host-to-host communication between the hardware at different hosts; the highest consists of user applications. For each layer, programs at different hosts use protocols appropriate to the layer to communicate with each other. TCP/IP has five layers of protocols; OSI has seven. OSIlayers:
124
MMethod of
Least Squares
GLOSSARY
physicalconverts data bits (is and Os) into electrical (or optical) signals (specifying signallevels and timing) to allow transfer of data across parts of a network
data linkframes data into packets and checks the data transferred by level 1 to correcttransmission errors (or retransmit lost data), and control the speed and direction of flow of data between end points of the network
networkcontrols addressing and routing of data through the network, controlling congestion, negotiating packet sizes and protocols between networks, and accounting and billing for data transferred
transportprovides end-to-end data transport between users or processes on differentmachines, interfacing with the network layer to present network connectionsof appropriate types to the higher layers (e.g. an error-corrected point-to-pointchannel, transport of messages without guaranteed delivery, or broadcastingof messages to multiple destinations)
sessionallows higher layers to establish sessions across end-to-end transport links,controlling the direction of communications, providing tokens to regulate operations carried out across the link, and synchronising operations
presentationperforms conversion of data between end-systems' internal representations(e.g. ASCII or EBCDIC coding for characters, one's complement and two'scomplement representation of numbers etc) and abstract data structures, enabling interchange of data between different systems; and data compressionand encryption
applicationconverts between specific characteristics of end-systems' hardware and software and virtual models, enabling applications to run between different systems (e.g. general flle-transfer protocols use a model of a flle system which ismapped into specific systems' representations of file's names, format, encoding etc; similarly for email, directory lookup, remote job entry, terminal emulation etc)
Method for determining particular parameters of a predefined function thatbest fitted a set of data points in which for each point, the Y value of thepoint is plotted as a function of its X value. The method minimises the sum ofthe squared deviations of the Y values from the drawn function.
GLOSSARY
MaximumLikelihood
Methods
MPEG
Monte-CarloSimulation
MPLS
N
125
Adaptive procedures for measuring threshold in which the intensity of thestimulus presented on each trial is determined by a statistical estimation of theobserver's threshold that is made from all of the results obtained from the beginning of the test run.
Moving Picture Experts Group develops standards for digital video and digitalaudio compression. It operates under the auspices of the International Organization for Standardization (ISO). The MPEG standards are an evolvingseries, each designed for a different purpose. MPEG-2 images have four timesthe resolution of MPEG-1 images and can be delivered at 60 interlaced fieldsper second where two fields constitute one image frame. (MPEG-1 can deliver 30 noninterlaced frames per second.)
A computer simulation with a built-in random process, allowing for testingdifferent possible outcomes of a hypothesized model.
MultiProtocol Label Switching. A data transfer mode blending the characteristicsofIP and ATM. For a detailed description see e.g. (Armitage,2000).
nAFC n-Alternative-Forced Choice. Psychophysical testing paradigm, in which the experimental subject is forced to choose in which of n possibilities the stimuluslies.
Network A set of interconnected computers, peripherals and terminals. Its purpose isto enable each computing service to be accessed from other computers andterminals. Consists of an ensemble of switching nodes and transmission links;includes for mobile services all entities supporting mobile end-systems roaming through different cells of the network or even moving from one administrative domain to some other domain
Network An application available on a network, e.g.: electronic mail, ftle transfer, jobservice transfer or interactive terminal connection.
N-ISDN Narrowband ISDN: Two 64 Kbps channels plus one 16 Kbps signalling channel
oOSI Open ~stems Interconnection: A model developed by ISO (International Organi
zation for Standardization) to allow computer systems made by different vendors to communicate with each other. The goal of OSI is to create a worldwide open systems networking environment where all systems can interconnect.
126
OSI referencemodel
pPacket
PacketSwitched
Mode
Perception
PEST
Positive response rate lfI
POTS
Pragmatics
Protocol
GLOSSARY
ISO model for communication between equipment and networks - the famous 7-layer model.
A block of information with a defined format containing control informationand data. "Packet" is a generic term used to describe units of data at all levelsof the protocol stack, but it is most correcdy used to describe application dataunits.
Operational mode of a telecommunication network where information is conveyed in packets of constant or variable length, with packets undergoing temporary storage in nodes. Both the resources within nodes and on links are allocated dynamically, such that, on a statistical basis, a better resource utilization is achieved for bursty traffic. There are two variants of packet mode: (1)with connectionless operation, no network resources are reserved for a particular end-user, i.e. the network is operating in a so-called best-effort mode(no QoS guarantee); (2) with connection-oriented operation, network resources are reserved for a so-called virtual connection such that some QoSguarantees (such as sustainable bit rate or limited delay) can be given.
The interpretation of sensory information to produce an internal representation of the world.
Parameter Estimation by Sequential Trials. Adaptive psychophysical testingmethod.
Rate of 'YES' answers in the yes-no paradigm, or rate of correct answers inthe forced-choice paradigm.
Plain Old Telephone Seroice: The service provided by the conventional analoguetelephone network, i.e. circuit-switched analogue connections with a bandwidth of 3,1 kHz. Its digital equivalent is provided by ISDN.
The study of language seen in relation to its users, branch of semiotics.
A formal description of message formats and the rules two computers mustfollow to exchange those messages. Protocols can describe low-level details ofmachine-to-machine interfaces (e.g. the order in which bits and bytes are sentacross a wire) or high-level exchanges between allocation programs (e.g., theway in which two programs transfer a flle across the Internet).
GLOSSARY
Q
R
127
QoS Quality l?! Service: Formal definition of quality for some specific telecommunication service, using specific parameters. A certain QoS may be agreed by anetwork user and the network operator at different instances and for differentdurations, i.e. its validity may be limited to a connection (or even only partthereof), or it may be the subject of a so-called service level agreement. For a detailed description see (Fluckiger, 1995).
Response Bias
Retinotopy
Return tripdelay
sSemantics
Semiotics
Sensation
SMS
SourceCoding
Syntax
A tendency for the observer to favour one response over another, which isdetermined by factors other than the intensity of the stimulus.
The notion that receptor cells in the retina are mapped to points e.g. on thesurface of the visual cortex.
The elapsed time between the emission of the first bit of a data block and itsreception by the same end-system after the block has been echoed by the destination end-system.
The study of meanings, branch of semiotics.
The science of signs and/or sign systems.
Process of detecting a stimulus or some aspect of it.
Short Message Service: An E-Mail service with very limited capabilities offered inthe framework of the GSM mobile phone system.
Bringing the raw information produced by a source into a form suitable fortransmission. Usually involves A/D conversion and may involve compression.
The rules by which signs are combined to make f'tatements, branch of semiotiCS.
128
T
GLOSSARY
Threshold In our context, the term threshold describes what elsewhere is referred to asEmpirical or Statistical Threshold: The intensity of a stimulus required for a specified level of performance by an observer. Examples are the intensity of thestimulus corresponding to reporting the stimulus 50% of the time in theyes-noparadigm, or correctly detecting the stimulus 75% of the time in a 2APeparadigm. See also Absolute Threshold, and Difference Threshold.
TCPlIP TCPlIP usually refers to the suite of transport and application protocols, especially TCP, which run over IP.
Throughput see bit rate
uUMTS Universal Mobile Telecommunications ~stem. UMTS is one of the Third Genera
tion mobile systems being developed within the framework, which has beendefined by the International Telecommunications Union (ITU) and known asIMT-2000. It seeks to build on and extend the capability of today's mobile,cordless and satellite technologies by providing increased capacity, data capability and a greater range of services.
Underflow A condition that can occur when the result of a floating-point operationwould be smaller in magnitude than the smallest quantity representable. Underflow is actually negative overflow of the exponent. For example, a resultless than 10-128 would cause underflow.
vVoIP 10ice over IP. Sometimes called Internet telephony, IP telephony, or Voice
over the Internet (VOl). A category of hardware and software that enablespeople to use the Internet as the transmission medium for telephone calls. Forusers who have free, or fixed-price Internet access, Internet telephony software essentially provides free telephone calls anywhere in the world. There aremany Internet telephony applications available. Some come bundled withpopular Web browsers, others are stand-alone products.
wWeber's Law Says that the size of the just noticeable difference (see also Difference Threshold)
is a constant proportion of the original stimulus value.
GLOSSARY 129
WAN Wide Area Network: Network extending over large distances/area (typically 10- 1000 km) operating at relatively slow speeds (10 kbit/s -10Mbit/s)
WWW World Wide web. Hypertext-based distributed information service, created byresearchers at CERN in Switzerland; WWW uses the HyperText Markup Language (HTML) for its formatting and interfaces for various systems are available Users may create, edit or browse hypertext documents.
yyes-no Psychophysical testing paradigm, in which an experimental subject has to an
swer after each presentation if she/he detects the stimulus. The presentationscontain a predefined percentage of stimuli.
Seite l.eer /Blank leaf
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Index
2AFC 54 66 77 121, , ,30-ms-hypothesis 473-seconds-hypothesis 45accumulator 52action potential 49adaptive psychophysical procedure 55, 59affective judgement 20ambiguity 29amplifying principle 11anaesthesia 48ARES 75arousal 52ATM 9 74 75 121, , ,attention 45, 52attribution 20atttention 96audiometer Bosch ST10 68axon 49backchannel 95background noise 68background traffic 75bandwidth 10, 121beat frequency 50best-PEST 55,60,66, 77, 78bit rate 10, 121b' ..raln aCtlvlty 42CCD-Camera 76circadian rhytm 48circuit-switched 1 9
definition of 122classic conditioning 52client 75, 101,122cochlear nucleus 49coding 29, 38
non-verbal 31, 38verbal 30, 38
11 ..co ectlVlsm 24communication 23
audio-visual. 15, 39
business 13 26dial 'og 34face-to-face 13 39formal :.26informal 26interactive 34interpersonal 24layered model of 24multimodal 32 97. 'pnvate 13taxonomy of 23
=;~f~~:..~~~~~~~ ..~~~~~.::::::::::::::: ~~comparator 53compression 10, 15,122content provider 85cortex
auditory 39, 42 88. al 'vlSU 39, 42,88
CSCW 13, 122culture 24decision criterion 66d f' ..egree 0 mteraCtlVlty 77, 94, 96de-interlacing 76
delah..·..· · ··· ..·· · · ··· · ·15a solute 3, 18, 35, 36, 39, 89interaural 49relative 3, 17,37,39,85ret\lrn trip 3, 127rOClndtrip 3,35,36t\ .lnsit 36
digitiser 76distractor 95distribution
cumulative normal 59gau;ss~an 53, 87, 90IOgJ.stlc 59, 90log-normal 90Poisson 90res~onse 47 51rig t-skewed :.90Weibull 59
DVD 76, 122electroencephalography (EEG) 41
132
ETHMICS 75event-related potential (ERP) 41,88explicit handover 95facilitation
cross-modal 41intensity 41
false alarm 55false negative 55, 66, 104false positive 55, 66, 104forced-choice 54, 123formality 26,37graphics-card 76human-computer-interaction (HCI)3, 65, 85,
89defInition of 123
human-human-interaction 91human-human-interaction (HHI) 3, 74
defInition of 123individualism 24inflection point 57information
~t~i~:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::~~prosodic 15temporal 52
interactionbasic auditory 74,77,91basic visual 74, 77,91click-visual 89realistic audio-visual 74, 78realistic auditory 74, 78synchronous 43unit of 34voice-visual 89
intermedia synchronisation 3, 37interruption 95invers function 86IP 1,9,123ISDN 1,9,123isometric perspective 44JPEG-decoder 76JPEG-encoder 76labelled lines 49UN 18, 123Landolt-rings acuity chart 68Lingo 66lip synchronisation 3, 37logarithmic transformation 62logistic 56, 87Macromedia 66, 101
INDEX
man-machine interaction 13marginal utility 110masking principle 11maximum likelihood 59,125media richness 95memory
long-term 53reference 53short-term 47,53working 53
mental construct system 20method ofleast squares 70, 79, 82, 124miss 55modality 31,39,53
auditory 39visual 39
mode 66, 102Monte-Carlo simulation 107,125MPEG 2 76, 125MPLS 9,125n-alternative-forced-choice (nAFC) 54, 102necker cube 45network l, 10, 74, 125network planner 85network service 10, 15, 125neural networks 50neuron 49, 50number of turns 95orientation 27, 38
content 28non-person 27,38person 27, 38relationship 28
oscillationsneuronal 47
OSI 24,39,125pacemaker 50, 52, 53pacemaker-switch-accumulator 52packet-switched 1, 9
defInition of 126pattern recogni lon ·· 13, 48perception 126perceptual store ·.· 52population clocks 50population models 50POTS 11, 126pragmatics 31, 126present
abstract connotation of 44subjective 44
INDEX
pre-test 67, 104processing
high frequency 4610:" frequency 44ffilcrosecond 49
processing timeauditory 40 87
. 1 'Vlsua 40, 87psychometric function 55
elevation 61scaling 61
psychophysics 40d fi ..e In1t1on 19theory 54
puIs 52,53QED 11Quality of Service 1 10
dflr" f 'e In1t1on 0 127reaction time 40
choice 41recognition 41simple 41
responsecorrect 54positive 55, 126yes 55
response bias 90, 127retinotopy 48, 127sampling point 61scaling 76semantics 31,127semiotics 31, 127sensation 31, 127
. 'dsetVlce prov1 er 85shared flat 78sink 35slope 57, 104smallest step size 66, 103SMS 10,127social context 26, 37source
coding oL 10 36 127d di ' ,eco ng of. 36
start value 66, 103step function 58stimulus 54
absence of 54auditory 53
f:ft~::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::~~filled 53intensity oL 53, 55, 56
133
moving 53offset of. 67onset of 67order 67presence of 54producer of 74receiver of 74static 53visual 53,68,88
stochastic observer 107superior olivary complex 49switch 52, 53
ATM 75ethernet 75
SThfLOG 28synchronisation error 3syntax 31, 127telesurgery 85temporal information processing (TIP) 52temporal pattern 14temporal reproduction 45termination criterion 66, 103threshold 128
absolute delay 65, 74acceptance 14, 83definition of. 56difference 92, 103, 122hearing 68measuring 54, 59perception 14,83relative delay 65temporal order 46
throughput 15, 128time
magnitudes of 48perception of 48
timing , 34, 39asynchronous 34synchronous 34
topographic map 49trigger
mouse 69vocal. 68
t-test 70, 79, 82two-alternative forced-choice 54UDP 75UMTS 13,31,128underflow 62, 128videoconference 1,37,38, 75, 78
. al .VISU acwty 68voice-over-IP 1, 128WAN 18,129
134
Weber's Law 51,128WWW 13, 129
INDEX
yes-no 54, 78, 102definition of 129
About the Author
Hans-Jorg Zuberbiihlerwas born 11. February 1968 inSt.Gallen, Switzerland. Afterprimary school, he completedan apprenticeship at the SwissFederal Laboratories for Materials Testing and Research(EMPA) in St.Gallen. Aftersome years of industrial experience, he studied environmental sciences and ergonomics at the Swiss Federal Institute of Technology (ETH) in Zurich. In 1999 he received a master's degree with a thesis about human motion perception and its impact on the acquisition of procedural knowledge. Since then he has been employedas a research assistant at the Institute for Hygieneand Applied Physiology at the ETH. His researchinterests comprise the fields of cognitive ergonomics, sensory physiology and psychophysics as well asmethodical issues.