cosc 426 lect. 7: evaluating ar applications
Post on 29-Nov-2014
1.575 Views
Preview:
DESCRIPTION
TRANSCRIPT
Lecture 7: Evaluating AR Lecture 7: Evaluating AR Applicationspp
Mark BillinghurstgHIT Lab NZ
University of Canterbury University of Canterbury
B ildi C lli AR E iBuilding Compelling AR Experiences
experiencesEvaluation
applications Interaction
tools Authoringtools Authoring
components Tracking, Display
Sony CSL © 2004
Introduction
The Interaction Design Process
The Interaction Design Process
Why Evaluate AR Applications?To test and compare interfaces, new technologies, interaction techniquesTest Usability (learnability, efficiency, satisfaction,...)Get user feedbackGet user feedbackRefine interface designB tt d t d d Better undertsand your end users...
Survey of AR PapersEdward Swan (2005)Edward Swan (2005)Surveyed major conference/journals (1992-2004)
P ISMAR ISWC IEEE VR- Presence, ISMAR, ISWC, IEEE VRSummary
1104 t t l 1104 total papers266 AR papers38 AR HCI papers (Interaction)38 AR HCI papers (Interaction)21 AR user studies
O l 21 f 266 AR h d f l t d Only 21 from 266 AR papers had a formal user study Less than 8% of all AR papers
AR Papers
HIT Lab NZ Usability SurveyA Survey of Evaluation Techniques Used in Augmented Reality Studies
Andreas Dünser, Raphaël Grasset, Mark pBillinghurst
reviewed publications from 1993 reviewed publications from 1993 and 2007
Extracted 6071 papers which mentioned p p“Augmented Reality”Searched to find 165 AR papers with User StudiesStudies
350
400
450
200
250
300
50
100
150
01992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
ACM Digital Library SpringerLinkIEEE Xplore Journals ScienceDirectSPIE Digital Library InformaWorldMIT Press Journals HighwireBlackwell Synergy Mary Ann LiebertWiley Interscience Sage Journals OnlineEmerald Insight Oxford JournalsCambridge Journals Online ASCE PublicationsJSTOR KargerWorldSciNet BioMed Central ASME Annual ReviewsNature Online MathSciNetNational Research Council of Canada Research Press (NRC) AdisOnline APS Journals (PROLA) Royal Society Publishing
Types of User Studies
Types of AR user studiesPerceptionpUser PerformanceCollaborationCollaborationUsability of Complete Systems
Types of AR User Studies
Types of Experimental Measures UsedTypes of Experimental Measures
Objective measuresSubjective measuresQualitative analysisU b l l hUsability evaluation techniquesInformal evaluations
Types of Experimental Measures Used
Summary
Over last 10 yearsMost user studies focused on user performancepFewest user studies on collaborationObjective performance measures most usedObjective performance measures most usedQualitative and usability measures least used
Types of User Evaluation
What is evaluation?
Evaluation is concerned with gathering data about the usability of a design or data about the usability of a design or product by a specified group of users for a particular activity within a specified environment or work contextenvironment or work context
EvaluationGoal: Measure goodness of the application designTwo types:yp
Formative evaluation performed at different stages of development to check that the product meets users’ needs.Summative evaluation assesses the quality of a finished product.
F i F i E l iFocusing on Formative Evaluation
When to evaluate?Once the application has been developed
pros : rapid development, small evaluation costcons : rectifying problems
During design and development
design implementation evaluation redesign &reimplementation
During design and developmentpros : find and rectify problems earlycons : higher evaluation cost, longer developmentcons : higher evaluation cost, longer development
design implementation
Four evaluation paradigms
‘quick and dirty’q yusability testing (lab studies)field studiesfield studiespredictive evaluation
Quick and dirty
‘quick & dirty’ evaluation: informal feedback from users or consultants to confirm that their ideas are in-users or consultants to confirm that their ideas are inline with users’ needs and are liked.Quick & dirty evaluations are done any timeQuick & dirty evaluations are done any time.Emphasis is on fast input to the design process rather th f ll d t d fi dithan carefully documented findings.
Usability TestingRecording typical users’ performance on typical tasks in controlled settings. Field observations may be used.g yAs the users perform these tasks they are watched & recorded on video & their inputs are logged. This data is used to calculate performance times, errors & help explain why the users did what they did. User satisfaction questionnaires & interviews are used to elicit users’ opinions.
Laboratory-based StudiesLaboratory-based studies
can be used for evaluating the design or the can be used for evaluating the design, or the implemented systemare carried out in an interruption-free usability labare carried out in an interruption-free usability labcan accurately record some work situations
di l ibl i l b isome studies are only possible in a lab environmentsome tasks can be adequately performed in a labare useful for comparing different designs in a controlled context
Laboratory-based Studies
Controlled, instrumented environment
Field StudiesField studies are done in natural settingsThe aim is to understand what users do naturally and yhow technology impacts them.In product design field studies can be used to:In product design, field studies can be used to:- identify opportunities for new technology- determine design requirements determine design requirements - decide how to introduce new technology- evaluate technology in useevaluate technology in use.
Predictive EvaluationExperts apply their knowledge of typical users, guided by heuristics, to predict usability problems. guided by heuristics, to predict usability problems. Can involve theoretically based models. A k f t f di ti l ti i th t l A key feature of predictive evaluation is that real end users need not be presentRelatively quick and inexpensive
Characteristics of ApproachesUsability testing
Field studies Predictive
U d k l l dUsers do task natural not involved
Location controlled natural anywhere
When prototype early prototype
Data quantitative qualitative problemsData quantitative qualitative problems
Feed back measures & errors
descriptions problemserrors
Type applied naturalistic expert
Evaluation Approaches and MethodsMethod Usability
testingField studies Predictive
Ob iObserving x x
Asking users x x
Asking experts
x xexpertsTesting x
Modeling x
DECIDE: A framework to guide evaluationA framework to guide evaluation
- Determine the goals the evaluation addresses.Determine the goals the evaluation addresses.- Explore the specific questions to be answered.
Ch th l ti p di d t h i- Choose the evaluation paradigm and techniques- Identify the practical issues.- Decide how to deal with the ethical issues.- Evaluate, interpret and present the data.Evaluate, interpret and present the data.
DECIDE FrameworkD G l Determine Goals:
What are the high-level goals of the evaluation?How wants the evaluation and why?How wants the evaluation and why?
Explore the Questions:Create well defined, relevant questionsq
Choose the Evaluation ParadigmInfluences the techniques used, how data is analyzed
Identify Practical IssuesHow to select users, stay on budget & scheduleHow to find evaluators select equipmentHow to find evaluators, select equipment
DECIDE FrameworkDecide on Ethical IssuesDecide on Ethical Issues
Informed consent formParticipants have a right to:
k th l f th t d d h t ill h t th fi di- know the goals of the study and what will happen to the findings- privacy of personal information
Evaluate, Interpret and Present Data, p
- Reliability: can the study be replicated?- Validity: is it measuring what you thought?y g y g- Biases: is the process creating biases?- Scope: can the findings be generalized?
E l i l lidit i th i t i fl i th lt ? - Ecological validity: is the environment influencing the results?
Usability Testing
Pilot StudiesA small trial run of the main study.
Can identify majority of issues with interface designCan identify majority of issues with interface designPilot studies check:- that the evaluation plan is viablep- you can conduct the procedure- that interview scripts, questionnaires, experiments, etc. work appropriatelyIron out problems before doing the main study.
Controlled experimentsDesigner of a controlled experiment should carefully consider
proposed hypothesisselected subjectsmeasured variablesexperimental methodsd ll idata collectiondata analysis
V i blVariablesExperiments manipulate and measure variables under Experiments manipulate and measure variables under controlled conditionsThere are two types of variables
independent: variables that are manipulated to create different experimental conditions
- e.g. number of items in menus, colour of the icons
dependent: variables that are measured to find out the effects of changing the independent variables
- e.g. speed of menu selection, speed of locating icons
Test ConditionsThe levels, values, or settings for an independent variableE lExample
- test conditions: HMD, Handheld device 1, Handheld device 2
“Other” VariablesControl variables
e.g. room light, noise…if controlled => less external validity
Random variables (not controlled)e.g. fatiguemore influence of random variable => less internal validity
Confounding variables practicepprevious experience
HypothesisA hypothesis is a prediction of the outcome
what will happen to the dependent variables when the independent variables are changedto show that the prediction is right
d d t i bl d ’t h b h i - dependant variables don’t change by changing the independent variables
- rejecting the null hypothesis (H0 )j g yp ( 0 )
Experimental methodsIt is important to select the right experimental method so that the results of the experiment can be generalizedThere are mainly two experimental methodsy p
between-groups: each subject is assigned to one experimental conditionwithin-groups: each subject performs under all the different conditions
Experimental methodsBetween-groups Within-groups
Subjects
g p
Subjects
g p
Randomlyassigned
Randomlyassigned
erim
enta
l tas
k
Condition2
Condition3
Condition1
rimen
tal t
asks
Condition2
Condition1
rimen
tal t
asks
Condition1
Condition2
rimen
tal t
asks
Condition1
Condition3
Expe
data data data data data data
Expe
r
Condition3 Ex
per
Condition3 Ex
per
Condition2
Statistical data analysis
data data data
Statistical data analysis
data data data
Within vs. Between Subjectsbetween subjects design
each participant is tested on only one level/conditiona separate group of participants is used for each condition
- one group uses HMD other group uses Handheld device
within subjects designparticipant is tested on each level/conditionparticipant is tested on each level/condition
- e.g. participants use Handheld device and HMD
repeated measurement
Between SubjectsSometimes a factor must be between subjects
e.g. gender, age, experience
Between subjects advantage: avoids interference effects (e.g. practice / learning effect)
Between subjects disadvantage:Increased variability = need more subjectsy j
Important: randomised assignment to conditions
Within SubjectsSometimes a factor must be within subjects
e.g. measuring learning effects
Within subjects advantagesless participants needed (all participants in all conditions)p p ( p p )differences (variability) between subjects the same across test conditions
Counterbalance order of presenting conditions A => B => C B => C => A C => A => BA B C B C A C A B
The order is best governed by a Latin Square
Latin Square Designeach condition occurs once in each row and column
Note: In a balanced Latin Square each condition both d d f ll h th diti l precedes and follows each other condition an equal
number of times
SubjectsTh h f b l h l d f h The choice of subjects is critical to the validity of the results of an experiment
bj t h ld b t ti f th subjects group should be representative of the expected user population
In selecting the subjects it is important to considerIn selecting the subjects it is important to considerthings such as their
age group, education, skills, cultureg g pHow does the sample influence the results?
Report the selection criteria and give relevant demographic information in your publication
SubjectsH ?How many participants?
How big is the effect you want to measure?l ff b d d i h ll l- large effects can be detected with smaller samples
- e.g. small n needed to discriminate speed between turtles and a rabbits
The more participants the “smoother” the datap p- Central Limit Theorem - as n increases (n>30) the sample mean
approaches a normal distributionextreme data has less influence (e g one sleepy participants does not - extreme data has less influence (e.g. one sleepy participants does not mess up the results that much)
for quantitative analysis: rule of thumb MINIMUM q y15-20 or more per group/cell
Data Collection and Analysis
The choice of a method is dependent on the type of d h d b ll ddata that needs to be collectedIn order to test a hypothesis the data has to be
l d l h danalysed using a statistical methodThe choice of a statistical method depends on the type of collected data
All the decisions about an experiment should be made before it is carried out
Observe and MeasureObservations are gathered…
manually (human observers)automatically (computers, software, cameras, sensors, etc.)
A measurement is a recorded observationObjective metricsjSubjective metrics
Typical objective metricsk l i itask completion time
errors (number, percent,…)percent of task completedpercent of task completedratio of successes to failuresnumber of repetitionsnumber of repetitionsnumber of commands usednumber of failed commandsnumber of failed commandsphysiological data (heart rate,…)…
Typical subjective metricsuser satisfactionsubjective performancej pratingsease of useease of useintuitivenessjudgments…
Data TypesSubjectiveSubjective
Subjective survey- Likert Scale, condition rankings
How easy was the task
1 2 3 4 5Observations
- Think Aloud
Interview responses
1 2 3 4 5Not very easy Very easy
Interview responses
ObjectivePerformance measurese o a ce easu es
- Time, accuracy, errors
Process measuresVid / di l i- Video/audio analysis
E erimental Meas resExperimental MeasuresMeasure What does it tell us? How is it measured?
Timings Performance Via a stopwatch, orautomatically by the device.
Errors Performance, Particular sticking points in a task By success in completing the task correctly. Through experimentercorrectly. Through experimenter observation, examining the route walked.
Perceived Workload Effort invested. User satisfaction Through NASA TLX scales and other i iquestionnaires.
Distance traveled and route taken
Depending on the application, these can be used to pinpoint errors and to indicate performance
Using a pedometer, GPS or other location-sensing system. By experimenter observation.
Percentage preferred walking speed
Performance By finding average walking speed, which is compared with normal walking speed.
Comfort User satisfaction. Device acceptability Comfort Rating Scale and other questionnaires.
User comments and preferences
User satisfaction and preferences. Particular sticking points in a task.
Through questionnaires, interviews and think-alouds.preferences sticking points in a task. think alouds.
Experimenter observations Different aspects, depending on the experimenter and on the observations
Through observation and note-taking
Statistical AnalysisOnce data is collected statistics can be used for analysisTypical Statistical Techniquesyp q
Comparing between two results- Unpaired T-Test (for between subjects – assumes normal distribution, interval
l h it f i )scale, homogeneity of variances)- Paired T-Test (for within subjects – assumes normal distribution, etc.)- Mann–Whitney U-test (between subjects – if assumptions are not met)
Comparing between > two results- Analysis of Variance – ANOVA
F ll d b t h l i B f i dj t t- Followed by post-hoc analysis – Bonferroni adjustment- Kruskal–Wallis (does not assume normal distribution)
Running the studyOffl d B !Offload your Brain!
Write down instructions h kli tprepare checklists
create templatesprint and pitch important informationprint and pitch important information
Try and find an assistantPrint questionnaires and other Print questionnaires and other documents the day beforeRehearse procedures - 4 kg in 2 weeksRehearse procedures. Bring your lunch – don’t forget to eat
4 kg in 2 weeks
Running the studyTreat the participants nicelyPrepare candy and drinks and make them feel good. p y gTake the role of a friendly waiter:
Always stay in background but offer assistance if needed.Always stay in background but offer assistance if needed.
Take notes, document oddities.Nothing is as bad as lost data!! Nothing is as bad as lost data!!
AVOID AVOID AVOID
Running the studyTake many photos of your setup in action. Prepare consent forms if you want to use pictures p y pfor publications.
Field Studies
F ld S dField StudiesField studies are done in natural settingsField studies are done in natural settings.“in the wild” is a term for prototypes being used freely in natural settingsfreely in natural settings.Aim to understand what users do naturally and how technology impacts themtechnology impacts them.Field studies are used in product design to:- identify opportunities for new technology;- identify opportunities for new technology;- determine design requirements; - decide how best to introduce new technology;gy;- evaluate technology in use.
59 www.id-book.com
ObservationDi b i i h fi ldDirect observation in the field
Structuring frameworksDegree of participation (insider or outsider)Degree of participation (insider or outsider)Ethnography
Direct observation in controlled environmentsDirect observation in controlled environmentsIndirect observation: tracking users’ activities
DiariesInteraction logging
Ethnography• Ethnography is a philosophy with a set of techniques that
include participant observation and interviews• Ethnographers immerse themselves in the culture studied
• Need cooperation of people being studied
• A researcher’s degree of participation can vary along a scale from ‘outside’ to ‘inside’A l i id d d l b i i• Analyzing video and data logs can be time-consuming• Can use continuous data analysis
• Collections of comments incidents and artifacts are made • Collections of comments, incidents, and artifacts are made
Direct observation in a controlled settingg
Think-aloud technique
Indirect observation
DiariesInteraction logsCultural probesCultural probes
Structuring frameworks to guide observation- The person. Who? - The place. Where?p- The thing. What?
The Goetz and LeCompte (1984) framework:The Goetz and LeCompte (1984) framework:- Who is present? - What is their role? - What is happening? - Where is it happening? - Why is it happening? - How is the activity organized?
Predictive Evaluation
Predictive ModelsProvide a way of evaluating products or designs without directly involving users.without directly involving users.Less expensive than user testing.Usefulness limited to systems with predictable tasksUsefulness limited to systems with predictable tasks
e.g., telephone answering systems, mobiles, etc.
Based on expert error-free behaviorBased on expert error-free behavior.
Fitts’ Law (Fitts, 1954)
Fitts’ Law predicts that the time to point at an object using a device is a function of the distance from the target using a device is a function of the distance from the target object and the object’s size. The further away and the smaller the object, the longer h l d the time to locate it and point to it.
GOMS ModelG l h h hi fi d Goals - the state the user wants to achieve e.g., find a website.Operators - the cognitive processes and physical actions Operators - the cognitive processes and physical actions needed to attain the goals
Eg moving mouse to select icong gMethods - the procedures for accomplishing the goals, e.g., drag mouse over icon, click on button.Selection rules - decide which method to select when there is more than one.
GOMS Response Times (Card et al., 1983)
Operator Description Time (sec)K Pressing a single key or buttong g y
Average skilled typist (55 wpm)Average non-skilled typist (40 wpm)Pressing shift or control keyTypist unfamiliarwiththekeyboard
0.220.280.08120Typist unfamiliar with the keyboard 1.20
P Pointing with a mouse or other device on adisplay to select an object.This value is derived from Fitts’ Law which is
0.40
P1discussed below.Clicking the mouse or similar device 0.20
H Bring ‘home’ hands on the keyboard or otherdevice
0.40device
M Mentally prepare/respond 1.35R(t) The response time is counted only if it causes
the user to wait.t
Expert InspectionsSeveral kindsExperts use their knowledge of users and technology to review application usability.Expert critiques can be formal or informal reports.H i ti l ti i i id d b t f h i tiHeuristic evaluation is a review guided by a set of heuristics
Eg: Visibility of system status Jacob Nielsen’s heuristics (1990s)Jacob Nielsen s heuristics (1990s)
Walkthroughs involve stepping through a pre-planned scenario noting potential problems
Eg load AR model, scale it twice the size, add new model, etc
Nielsen’s heuristicsVisibility of system statusVisibility of system status.Match between system and real world.User control and freedomUser control and freedom.Consistency and standards.E Error prevention. Recognition rather than recall.Flexibility and efficiency of use.Aesthetic and minimalist design.gHelp users recognize, diagnose, recover from errors.Help and documentation.Help and documentation.
Three Stages for Doing Heuristic Evaluation
1/ Briefing session to tell experts what to do.2/ E l i i d f 1 2 h i hi h2/ Evaluation period of 1-2 hours in which:
Each expert works separately;Take one pass to get a feel for the product;Take one pass to get a feel for the product;Take a second pass to focus on specific features.
3/ Debriefing session in which experts work together to 3/ Debriefing session in which experts work together to prioritize problems.
No. of evaluators & problems
Advantages and ProblemsFew ethical and practical issues to consider because users not involved.Can be difficult and expensive to find experts.Best experts have knowledge of application domain and users.Biggest problems:
Important problems may get missed;Important problems may get missed;Many trivial problems are often identified;Experts have biases.
Case Studies
Types of AR ExperimentsPerception
How is virtual content perceived ?pWhat perceptual cues are most important ?
InteractionHow can users interact with virtual content ?Which interaction techniques are most efficient ?
CollaborationHow is collaboration in AR interface different ?Which collaborative cues can be conveyed best ?
PerceptionCentral goal of AR systems is to fool the human perceptual
systemDi l M dDisplay Modes
Direct ViewStereo VideoStereo VideoStereo graphics
Multi-modal displayMulti-modal displayDifferent objects with different display modesPotential for depth cue conflictp
Perceptual User StudiesDepth / Distance Studies
Estimate distance to objectJudge relative proximity
Object localizationjMatch physical and virtual object positions
Diffi ltiDifficultiesPrecise alignment / calibration of displaysL i h d t ki ( t ti i )Lag in head tracking (use static images)
Layar – www.layar.com
Outdoor AR: Limited Field of View
P ibl l iPossible solutionsOverview + Detail
spatial separation; two views
Focus + Contextmerges both views into one view
Zoomingtemporal separation
Z i ViTU G HIT L b NZ ll b ti
Zooming ViewsTU Graz – HIT Lab NZ - collaboration
Zooming panoramaZ i MZooming Map
Z i AR i fZooming AR interfaces
Context Compass Zooming Panorama Zooming MapContext Compass Zooming Panorama Zooming Map
Interface TypesC (C)Compass (C)Compass + Zooming Panorama (CP)Compass + Zooming Map (CM)p g p ( )Compass, Zooming Panorama, Zooming Map (CPM)
Experiment Evaluation
20 subjects (10 M/ 10 F)Café finding taskg
Task 1: Find particular café named “Alpha”Task 2: Find closest café
Experiment measuresTime to complete taskTime to complete taskAngular distance panned aroundSubjective survey feedback j y
Performance Time
Distance Panned
ResultsCompass good for search, but not comparisonZooming (P or M) aids comparison g ( ) pInformation has significant effectCompass requires more panningCompass requires more panningUser felt compass alone wasn’t useful
Interaction StudiesStages of Interface Development• Prototype Demonstration• Adoption of Interaction techniques from other interface
metaphors • Development of new interface metaphors appropriate to
the medium• Development of formal theoretical models for predicting
and modeling user interactions
Fitt’s Law (1964)Relates Movement Time to Index of Difficulty
MT = a + b log2(2A/W)
where log2(2A/W) = ID
Robust under most circumstancesobject tracking tapping tasks movement tasksobject tracking, tapping tasks, movement tasks
Interaction Study - ReachingMason, A. et. al. (2001). Reaching Movements to Augmented and Graphic Objects in Virtual Environments. Proc. CHI 2001.
D Fitt’ L h ld i i iti t k?Does Fitt’s Law hold in an acquisition task?Does Fitt’s Law hold when reaching for virtual objects ?D F ’ L h ld h ’ h d ?Does Fitt’s Law hold when you can’t see your hand ?
Experimental SetupEnhanced Virtual Hand LabHalf Silvered MirrorShutter GlassesOPTOTRAK optical trackerp
IREDs worn on wrist, object
Four target cubesg
Conditions:Cube size arm visibility real/virtual objectsCube size, arm visibility, real/virtual objects
Kinematic MeasuresMovement TimePeak Velocity of WristyTime to Peak Velocity of the WristPercent Time from Peak Velocity of the WristPercent Time from Peak Velocity of the Wrist
Results – Movement Time
Results – Velocity Profiles
AR NavigationMany commercial AR browsers
Information in placeHow to navigate to POI
2D vs. AR Navigation?
VS
AR Navigation StudyUsers navigate between Points of InterestUsers navigate between Points of InterestThree conditions
AR U i l AR iAR: Using only an AR view2D-map: Using only a top down 2D map viewAR+2D-map: Using both an AR and 2D map view
Experiment MeasuresQuantitative
- Time taken, Distance travelled
Qualitative - Experimenter observations, Navigation behavior, Interviews
U kl d (NASA TLX)- User surveys, workload (NASA TLX)
HIT Lab NZ Test Platform – AR View
HIT Lab NZ Platform – Map View
Distance and Time
No significant differences
Paths Travelled
Red – ARBlue – AR + MapBlue AR MapYellow - Map
Navigation BehaviourD d i t fDepends on interface
Map doesn’t show short cutscuts
Survey Responses
User CommentsAR“ d ' k l h ll f h i ”“you don't know exactly where you are all of the time.”“using AR I found it difficult to see where I was going”
MMap“you were able to get a sense of where you were”“ t ll bl t th ph i l bj t d ”“you are actually able to see the physical objects around you”
AR+MAP“I d th p t th b i i t d t d h th “I used the map at the beginning to understand where the buildings were and the AR between each point”“You can choose a direction with AR and find the shortest way You can choose a direction with AR and find the shortest way using the map.”
Usability IssuesScreen readability in sunlightGPS inaccuracies Compass errorsTouch screen difficultiesTouch screen difficultiesNo routing information
Lessons LearnedUser adapt navigation behaviour to guide type
AR interface shows shortcutsMap interface good for planning
Include map view in AR interface2D exocentric, and 3D egocentric
Allow people to easily change between viewsp p y gMay use Map far away, AR close
Difficult to accurately show depthy p
Collaboration StudiesRemote Conferencing
Face to Face Collaboration
Remote AR Conferencing
Moves conferencing from the desktop to the workspace
Pilot StudyHow does AR conferencing differ ?
Taskdiscussing images12 pairs of subjects
Conditionsaudio only (AC)y ( )video conferencing (VC)mixed reality conferencing (MR)
Sample Transcript
Transcript Analysis
Users speak most in Audio Only conditionMR fewest words/min and interruptions/minMore results needed
Presence and CommunicationPresence Rating (0-100)
8090
100
40506070
0102030
Could tell when Partner was Concentrating14
AC VC MR
8
10
12
0
2
4
6
AC VC MR
Subjective CommentsPaid more attention to pictures Remote video provided peripheral cuesRemote video provided peripheral cuesIn AR condition
Difficult to see everythingDifficult to see everythingRemote user distractingCommunication asymmetriesCommunication asymmetries
Face to Face CollaborationCompare two person collaboration in:
Face to Face, AR, Projection Display
TaskUrban design logic puzzleUrban design logic puzzle
- Arrange 9 building to satisfy 10 rules in 7 minutes
SubjectsSubjectsWithin subjects study (counter-balanced)12 pairs of college students12 pairs of college students
Face to Face Condition
Moving Model Buildings
AR Condition
Cards with AR ModelsCards with AR ModelsSVGA AR Display (800x600)Video see-through ARg
Projection Condition
Tracked Input Devices
Task Space Separation
Interface ConditionsFtF AR Projection
User Viewpoint Independent Private PublicpEasy to change Independent
Easy to changeCommonDifficult to change
Limited FOV
Interaction Two handedNatural object manipulation
Two handedTangible AR techniques
Mouse-basedOne-handedTime-multiplexedmanipulation
Space-multiplexedtechniquesSpace-multiplexed
Time-multiplexed
Hypothesis
Collaboration with AR technology will produce behaviors that are more like natural face-to-face collaboration than from using a screen-face collaboration than from using a screen
based interface.
MetricsSubjective
Evaluative survey after each conditionEvaluative survey after each conditionForced-choice survey after all conditionsPost experiment interviewPost experiment interview
ObjectivejCommunication measures
- Video transcriptionp
Measured ResultsPerformance
AR collaboration slower than FtF + Projectionj
CommunicationPointing/Picking gesture behaviors same in AR as FtFPointing/Picking gesture behaviors same in AR as FtFDeictic speech patterns same in AR as FtF
- Both significantly different than Projection conditiong y j
SubjectiveFtF easier to work together and understandFtF easier to work together and understandInteraction in AR easier than Proj. and same as FtF
Deictic Expressions
25%
30%
15%
20%
5%
10%
0%FtF Proj AR
Significant difference – ANOVA, F(2,33) = 5.77, P < 0.01No difference between FtF and AR
Ease of Interaction
S f d ffSignificant differencePick - F(2,69) = 37.8, P < 0.0001Move - F(2,69) = 28.4, P < 0.0001
Interview Comments“AR’s biggest limit was lack of peripheral vision. The interaction was natural, it was just difficult to see. In the projection condition you could see everything but the interaction was tough”Face to Face
Subjects focused on task space- gestures easy to see gaze difficult- gestures easy to see, gaze difficult
Projection displayInteraction difficult (8/14)
- not mouse-like, invasion of space
AR display – “working solo together”Lack of peripheral cues = “tunnel vision” (10/14 people)Lack of peripheral cues = tunnel vision (10/14 people)
Face to Face SummaryCollaboration is partly a Perceptual task
AR reduces perceptual cues -> Impacts collaborationTangible AR metaphor enhances ease of interaction
Users felt that AR collaboration different from FtFBut:
measured speech and gesture behaviors in AR condition is more similar to FtF condition than in Projection display
Thus we need to design AR interfaces that don’t reduce perceptual h l k f cues, while keeping ease of interaction
Case Study: A Wearable Information Space
Head Stabilized Body Stabilized
A AR i t f id ti l di d i lAn AR interface provides spatial audio and visual cuesDoes a spatial interface aid performance?
–Task time / accuracyM. Billinghurst, J. BowskilE, Nick DyeE, Jason Morphett (1998). An Evaluation of Wearable Information Spaces. Proc. �Virtual Reality Annual International Symposium.
Task PerformanceT k Task
find target icons on 8 pagesremember information space remember information space
Conditions A - head-stabilized pagesA head stabilized pagesB - cylindrical display with trackballC - cylindrical display with head tracking
SubjectsWithin subjects (need fewer subjects)12 subjects used
Experimental MeasuresObObjective
spatial ability (pre-test)time to perform task Manytime to perform taskinformation recallworkload (NASA TLX)
Many Different
SubjectivePost Experiment Survey
Measures
- rank conditions (forced choice)- Likert Scale Questions
• “How intuitive was the interface to use?”
Post Experiment SurveyFor each of these conditions please answer:For each of these conditions please answer:
1) How easy was it to find the target?1 2 3 4 5 6 71=not very easy 7=very easy
For the head stabilised condition (A):For the head stabilised condition (A):For the cylindrical condition with mouse input (B):For the head tracked condition (C):
Rank all the conditions in order on a scale of one to three 1) Which condition was easiest to find target (1 = easiest, 3 = hardest)
A: B: C:
ResultsBody Stabilization Improved PerformanceBody Stabilization Improved Performance
search times significantly faster (One factor ANOVA)
Head Tracking Improved Information recallno difference between trackball and stack case
Head tracking involved more physical work
Subjective Impressions
3.54
4.55
1 52
2.53
Find Target
Enjoyable
00.5
11.5
A B C
Subjects Felt Spatialized Conditions (ANOVA):
A B C
j p ( )More enjoyableEasier to find target
Subjective Impressions
2
2.5
3
1
1.5
2EasiestUnderstandingIntuitive
0
0.5
A B C
Subject Rankings (Kruskal-Wallis)S ti li d i t th h d t bili dSpatialized easier to use than head stabilizedBody stabilized gave better understandingHead tracking most intuitiveg
Conclusions
Key Points• There is a need for more user evaluation of AR
experiences• There are several evaluation approaches that can be used
• ‘quick and dirty’q y• usability testing (lab studies)• field studiese stu es• predictive evaluation
• Studies should use multiple qualitative and quantitative • Studies should use multiple qualitative and quantitative experimental measures.
Resources
Online ResourcesMeta-site for Statistical Analysis
http://home.ubalt.edu/ntsbarsh/stat-data/Topics.htm
Online Statistical Analysishttp://www.quantitativeskills.com/sisa/
Experiment Designhttp://en.wikipedia.org/wiki/Design_of_experimentsp p g g _ _ phttp://www.curiouscat.net/library/designofexperiments.cfm
BooksJ. Nielsen "Usability Engineering", Academic Press, 1993. H. Sharp, Y. Rogers, J. Preece. “Interaction Design: Beyond H I i ” J h Wil & S 2007Human-computer Interaction”, John Wiley & Sons, 2007J. Spool, J. Rubin, D. Chisnell. “Handbook of Usability Testing: How to Plan Design and Conduct Effective Tests” John How to Plan, Design, and Conduct Effective Tests , John Wiley & Sons, 2008T Tullis B Albert “Measuring the User Experience: T. Tullis, B. Albert. Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics”, Morgan Kaufmann , 2008gA. Field, G. Hole. “How to Design and Report Experiments”, Sage Publications Ltd, 2003
top related