evaluation and metrics: measuring the effectiveness of virtual environments doug bowman edited by c....
TRANSCRIPT
![Page 1: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/1.jpg)
Evaluation and metrics: Measuring the effectiveness of
virtual environments
Doug Bowman
Edited by C. Song
![Page 2: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/2.jpg)
(C) 2005 Doug Bowman, Virginia Tech 2
11.2.2 Types of evaluation
Cognitive walkthrough
Heuristic evaluation
Formative evaluation Observational user studies Questionnaires, interviews
Summative evaluation Task-based usability evaluation Formal experimentation
Sequentialevaluation
Testbedevaluation
![Page 3: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/3.jpg)
(C) 2005 Doug Bowman, Virginia Tech 3
11.5 Classifying evaluation techniques
Ÿ Form al Sum m ativeEvaluation
Ÿ Post-hoc Q uestionnaire
Ÿ (generic perform ancem odels for VEs (e.g., fitt'slaw))
Ÿ In form al Sum m ativeEvaluation
Ÿ Post-hoc Q uestionnaire
Ÿ H euris tic Evaluation
Ÿ Form ative EvaluationŸ Form al Sum m ative
EvaluationŸ Post-hoc Q uestionnaire
Ÿ Form ative Evaluation(in form al and form al)
Ÿ Post-hoc Q uestionnaireŸ In terview / D em o
Ÿ (application-specificperform ance m odels forVEs (e.g., G O M S))
Ÿ H euris tic EvaluationŸ C ognitive W alk through
Generic
Quantitative
Qualitative
Requires Users Does Not Require Users
{Quantitative
Qualitative
U s e r I n v o l v e m e n t
C o
n t
e x
t
o f
E
v a
l u
a t
i o
n
T y
p e
o f R
e s
u l t s
ApplicationSpecific{
Generic
Qualitative
Quantitative
Application-specific
Qualitative
Quantitative
![Page 4: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/4.jpg)
(C) 2005 Doug Bowman, Virginia Tech 4
11.4 How VE evaluation is different
Physical issuesUser can’t see world in HMDThink-aloud and speech incompatible
Evaluator issuesEvaluator can break presenceMultiple evaluators usually needed
![Page 5: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/5.jpg)
(C) 2005 Doug Bowman, Virginia Tech 5
11.4 How VE evaluation is different (cont.)
User issuesVery few expert usersEvaluations must include rest breaks to
avoid possible sickness
Evaluation type issuesLack of heuristics/guidelinesChoosing independent variables is difficult
![Page 6: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/6.jpg)
(C) 2005 Doug Bowman, Virginia Tech 6
11.4 How VE evaluation is different (cont.)
Miscellaneous issuesEvaluations must focus on lower-level
entities (ITs) because of lack of standardsResults difficult to generalize because of
differences in VE systems
![Page 7: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/7.jpg)
(C) 2005 Doug Bowman, Virginia Tech 7
11.6.1 Testbed evaluation framework
Main independent variables: ITs
Other considerations (independent variables) task (e.g. target known vs. target unknown) environment (e.g. number of obstacles) system (e.g. use of collision detection) user (e.g. VE experience)
Performance metrics (dependent variables) Speed, accuracy, user comfort, spatial awareness…
Generic evaluation context
![Page 8: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/8.jpg)
(C) 2005 Doug Bowman, Virginia Tech 8
Testbed evaluation
User-centered Application8
Heuristics&
Guidelines
7
QuantitativePerform ance
Results
6
T e s t b e dE v a l u a t i o n
5
2Taxonom y
Outside Factorstask, users, evnironm ent,
system
3 4 Perform anceM etrics
Initial Evaluation1
![Page 9: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/9.jpg)
(C) 2005 Doug Bowman, Virginia Tech 9
Taxonomy
Establish a taxonomy of interaction technique for the interaction task being evaluate.
Example : Task: Changing the object’s color 3 sub tasks :
selecting object Choosing a color Applying color
2 possible technique components (TC) for choosing a color Changing the values of R, G and B sliders Touching a point within a 3D color space
![Page 10: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/10.jpg)
(C) 2005 Doug Bowman, Virginia Tech 10
Outside Factors
A user’s performance on an interaction task may depend on a variety of factors.
4 categories Task
Distance to be traveled, size of object to be manipulated Environment
The number of obstacles, the level of activity or motion User
Spatial awareness, physical attributes (arm length, etc) System
Lighting model, the mean frame rate etc.
![Page 11: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/11.jpg)
(C) 2005 Doug Bowman, Virginia Tech 11
Performance Metrics
Information about human performance
Speed, Accuracy : quantitative
More subjective performance valuesEase of use, ease of learning, and user
comfortThe user’s sense and body, user-centric
performance measure
![Page 12: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/12.jpg)
(C) 2005 Doug Bowman, Virginia Tech 12
Testbed Evaluation
Final stages in the evaluation of Interaction techniques for 3D Interaction tasks
Generic, generalizable, and reusable evaluation through the creations of test-beds.
Test-beds : Environments and tasks Involve all important aspects of a task Evaluate each component of a technique Consider outside influences on performance Have multiple performance measures
![Page 13: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/13.jpg)
(C) 2005 Doug Bowman, Virginia Tech 13
Application and Generalization of Results Testbed evaluation produces models that characterize the
usability of an interaction technique for the specified task. Usability is given in terms of multiple performance metrics w.r.t
various lelvels of outside factors. -> performance Database(DB) More information is added to the DB each time a new technique is
run through the testbed.
To choose interaction techniques for applications appropriately, one must understand the interaction requirements of the application The performance results from testbed evaluation can be used to
recommend interaction techniques that meet those requirements.
![Page 14: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/14.jpg)
(C) 2005 Doug Bowman, Virginia Tech 14
11.6.2 Sequential evaluation
Traditional usability engineering methods
Iterative design/eval.
Relies on scenarios, guidelines
Application-centric
User-centered Application
(D )R epresentative
U serT ask
Scenarios
(C )S tream lined
U ser In terfaceD esigns
(1)User TaskAnalysis
(3)Formative
User-CenteredEvaluation
(4)Summative
ComparativeEvaluation
(2)Heuristic
Evaluation
(A )T ask
D escriptionsSequences &D ependencies
(E )Iterative ly R efined
U ser In terfaceD esigns
(B)G uidelines
andH euris tics
![Page 15: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/15.jpg)
(C) 2005 Doug Bowman, Virginia Tech 15
11.3 When is a VE effective?
Users’ goals are realized
User tasks done better, easier, or faster
Users are not frustrated
Users are not uncomfortable
![Page 16: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/16.jpg)
(C) 2005 Doug Bowman, Virginia Tech 16
11.3 How can we measure effectiveness?
System performance
Interface performance / User preference
User (task) performance
All are interrelated
![Page 17: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/17.jpg)
(C) 2005 Doug Bowman, Virginia Tech 17
Effectiveness case studies
Watson experiment: how system performance affects task performance
Slater experiments: how presence is affected
Design education: task effectiveness
![Page 18: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/18.jpg)
(C) 2005 Doug Bowman, Virginia Tech 18
11.3.1 System performance metrics
Avg. frame rate (fps)
Avg. latency / lag (msec)
Variability in frame rate / lag
Network delay
Distortion
![Page 19: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/19.jpg)
(C) 2005 Doug Bowman, Virginia Tech 19
System performance
Only important for its effects on user performance / preference frame rate affects presencenet delay affects collaboration
Necessary, but not sufficient
![Page 20: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/20.jpg)
(C) 2005 Doug Bowman, Virginia Tech 20
Case studies - Watson
How does system performance affect task performance?
Vary avg. frame rate, variability in frame rate
Measure perf. on closed-loop, open-loop task
e.g. B. Watson et al, Effects of variation in system responsiveness on user performance in virtual environments. Human Factors, 40(3), 403-414.
![Page 21: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/21.jpg)
(C) 2005 Doug Bowman, Virginia Tech 21
11.3.3 User preference metrics
Ease of use / learning
Presence
User comfort
Usually subjective (measured in questionnaires, interviews)
![Page 22: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/22.jpg)
(C) 2005 Doug Bowman, Virginia Tech 22
User preference in the interface
UI goalsease of useease of learningaffordancesunobtrusivenessetc.
Achieving these goals leads to usability
Crucial for effective applications
![Page 23: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/23.jpg)
(C) 2005 Doug Bowman, Virginia Tech 23
Case studies - Slater
questionnaires
assumes that presence is required for some applications
e.g. M. Slater et al, Taking Steps: The influence of a walking metaphor on presence in virtual reality. ACM TOCHI, 2(3), 201-219.
study effect of:collision detectionphysical walkingvirtual bodyshadowsmovement
![Page 24: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/24.jpg)
(C) 2005 Doug Bowman, Virginia Tech 24
User comfort
Simulator sickness
Aftereffects of VE exposure
Arm/hand strain
Eye strain
![Page 25: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/25.jpg)
(C) 2005 Doug Bowman, Virginia Tech 25
Measuring user comfort
Rating scales
QuestionnairesKennedy - SSQ
Objective measuresStanney - measuring aftereffects
![Page 26: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/26.jpg)
(C) 2005 Doug Bowman, Virginia Tech 26
11.3.2 Task performance metrics
Speed / efficiency
Accuracy
Domain-specific metricsEducation: learningTraining: spatial awarenessDesign: expressiveness
![Page 27: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/27.jpg)
(C) 2005 Doug Bowman, Virginia Tech 27
Speed-accuracy tradeoff
Subjects will make a decision
Must explicitly look at particular points on the curve
Manage tradeoffSpeed
Acc
urac
y
![Page 28: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/28.jpg)
(C) 2005 Doug Bowman, Virginia Tech 28
Case studies: learning
Measure effectiveness by learning vs. control group
Metric: standard test
Issue: time on task not the same for all groups
e.g. D. Bowman et al. The educational value of an information-rich virtual environment. Presence: Teleoperators and Virtual Environments, 8(3), June 1999, 317-331.
![Page 29: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/29.jpg)
(C) 2005 Doug Bowman, Virginia Tech 29
Aspects of performance
SystemPerformance
InterfacePerformance Task
Performance
Effectiveness
![Page 30: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/30.jpg)
(C) 2005 Doug Bowman, Virginia Tech 30
11.7 Guidelines for 3D UI evaluation
Begin with informal evaluation
Acknowledge and plan for the differences between traditional UI and 3D UI evaluation
Choose an evaluation approach that meets your requirements
Use a wide range of metrics – not just speed of task completion
![Page 31: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/31.jpg)
(C) 2005 Doug Bowman, Virginia Tech 31
Guidelines for formal experiments
Design experiments with general applicability Generic tasks Generic performance metrics Easy mappings to applications
Use pilot studies to determine which variables should be tested in the main experiment
Look for interactions between variables – rarely will a single technique be the best in all situations
![Page 32: Evaluation and metrics: Measuring the effectiveness of virtual environments Doug Bowman Edited by C. Song](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c27476/html5/thumbnails/32.jpg)
(C) 2005 Doug Bowman, Virginia Tech 32
Acknowledgments
Deborah Hix
Joseph Gabbard